Quantcast
Channel: Active questions tagged header - Stack Overflow
Viewing all articles
Browse latest Browse all 705

Pandas read_csv() with multiple delimiters not working

$
0
0

I have a csv file I'm trying to read as a pandas dataframe. I need to skip the first 19 rows of comments. I have headers in the 20th row and data in subsequent rows. Only issue is the header row starts with a '#' which shifts the headers over. The rest of the data are delimited with a space. For some reason doing sep=r'#|\s+' introduces two 'Unnamed' columns to the dataset.

Raw Data Input (row number shown):
01|# comments...
02|# comments...
03|# comments...
.
.
.
19|# comments...
20|# Header1 Header2 Header3
21|Data1 Data2 Data3

Code:

df = pd.read_csv(df_path, skiprows=19, sep=r'#|\s+', engine='python', encoding='utf-8')

Output df:

Unnamed:0Unnamed:1Header1
Data1Data2Data3

Desired Output df:

Header1Header2Header3
Data1Data2Data3

How can I address the extra '#' in the header row without having this issue? I've also tried using

sep=r'[#|\s+]'

Viewing all articles
Browse latest Browse all 705

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>