I had a script that worked fine in PyCharm until I uninstalled/reinstalled Anaconda. Now I'm getting all kinds of errors.
One of them I've isolated to a read_csv that's not reading as I expected.
The formatted csv looks like this:
Price | Adj Close | Close | High | Low | Open | Volume |
---|---|---|---|---|---|---|
Ticker | ^BVSP | ^BVSP | ^BVSP | ^BVSP | ^BVSP | ^BVSP |
Date | ||||||
2014-01-02 | 50341.0 | 50341.0 | 51656.0 | 50246.0 | 51522.0 | 3476300 |
2014-01-03 | 50981.0 | 50981.0 | 50981.0 | 50269.0 | 50348.0 | 7360400 |
2014-01-06 | 50974.0 | 50974.0 | 51002.0 | 50451.0 | 50980.0 | 3727800 |
2014-01-07 | 50430.0 | 50430.0 | 51478.0 | 50429.0 | 50982.0 | 3339500 |
The raw .csv file looks like this:
Price,Adj Close,Close,High,Low,Open,VolumeTicker,^BVSP,^BVSP,^BVSP,^BVSP,^BVSP,^BVSPDate,,,,,,,2014-01-02,50341.0,50341.0,51656.0,50246.0,51522.0,34763002014-01-03,50981.0,50981.0,50981.0,50269.0,50348.0,73604002014-01-06,50974.0,50974.0,51002.0,50451.0,50980.0,37278002014-01-07,50430.0,50430.0,51478.0,50429.0,50982.0,3339500
My question: How should I read_csv if I want the dataframe to have a datetime index called 'Date' (where the text I want to use in the first column, third row) and columns called Adj Close, Close, High, Low, Open, Volume (which are in the first row, columns 2-7)
Is there any way I can do it in one line, or do I need to read using the first row as headers, then rename Price to Date?
What I want the df to look like is:
Date | Adj Close | Close | High | Low | Open | Volume |
---|---|---|---|---|---|---|
2014-01-02 | 50341.0 | 50341.0 | 51656.0 | 50246.0 | 51522.0 | 3476300 |
2014-01-03 | 50981.0 | 50981.0 | 50981.0 | 50269.0 | 50348.0 | 7360400 |
2014-01-06 | 50974.0 | 50974.0 | 51002.0 | 50451.0 | 50980.0 | 3727800 |
2014-01-07 | 50430.0 | 50430.0 | 51478.0 | 50429.0 | 50982.0 | 3339500 |
I'm using this code, which works, but it seems clumsy. Is there a simpler way?
idx_df = pd.read_csv( f'{data_folder}/INDEX_{idx_code}.csv', header=None, skiprows=3, # data starts on row 4 names=['Date', 'Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume'], index_col='Date' ) idx_df.index = pd.to_datetime(idx_df.index, errors='coerce') # Try this to remove error