I have a csv file I'm trying to read as a pandas dataframe. I need to skip the first 19 rows of comments. I have headers in the 20th row and data in subsequent rows. Only issue is the header row starts with a '#' which shifts the headers over. The rest of the data are delimited with a space. For some reason doing sep=r'#|\s+' introduces two 'Unnamed' columns to the dataset.
Raw Data Input (row number shown):
01|# comments...
02|# comments...
03|# comments...
.
.
.
19|# comments...
20|# Header1 Header2 Header3
21|Data1 Data2 Data3
Code:
df = pd.read_csv(df_path, skiprows=19, sep=r'#|\s+', engine='python', encoding='utf-8')
Output df:
Unnamed:0 | Unnamed:1 | Header1 |
---|---|---|
Data1 | Data2 | Data3 |
Desired Output df:
Header1 | Header2 | Header3 |
---|---|---|
Data1 | Data2 | Data3 |
How can I address the extra '#' in the header row without having this issue? I've also tried using
sep=r'[#|\s+]'