我有一个CSV文件(aal_21_02_2018),格式如下:
,Open,High,Low,Close,Volume
2018-02-21 08:01:00,1744.2,1746.0,1738.6,1738.6,34727
2018-02-21 08:02:00,1738.8,1743.0,1738.8,1740.0,6483
2018-02-21 08:03:00,1739.6,1739.6,1737.8,1738.2,6622
我想将此文件转换为DataFrame。当我运行以下方法时:
df = read_csv('aal_21_02_2018', index_col='datetime')
显示以下错误:
ValueError: Index datetime invalid
如何将此CSV文件正确解析为DataFrame?
答案 0 :(得分:2)
你有一个未命名的列,所以传递顺序位置
df = read_csv('aal_21_02_2018', index_col=0)
示例:
In[4]:
df = pd.read_csv(io.StringIO(t), index_col=0)
df
Out[4]:
Open High Low Close Volume
2018-02-21 08:01:00 1744.2 1746.0 1738.6 1738.6 34727
2018-02-21 08:02:00 1738.8 1743.0 1738.8 1740.0 6483
2018-02-21 08:03:00 1739.6 1739.6 1737.8 1738.2 6622
如果您需要datetimeIndex,则可以传递parse_dates=[0]
:
In[7]:
df = pd.read_csv(io.StringIO(t), index_col=0, parse_dates=[0])
df
Out[7]:
Open High Low Close Volume
2018-02-21 08:01:00 1744.2 1746.0 1738.6 1738.6 34727
2018-02-21 08:02:00 1738.8 1743.0 1738.8 1740.0 6483
2018-02-21 08:03:00 1739.6 1739.6 1737.8 1738.2 6622
我们可以看到索引现在是datetimeIndex:
In[8]:
df.index
Out[8]:
DatetimeIndex(['2018-02-21 08:01:00', '2018-02-21 08:02:00',
'2018-02-21 08:03:00'],
dtype='datetime64[ns]', freq=None)
关于如何发生这种情况,默认的to_csv
行为是将索引输出为未命名的列,如果您传递index_label='datetime'
,那么这将写出一个命名索引:
In[10]:
df.to_csv(index_label='datetime')
Out[10]: 'datetime,Open,High,Low,Close,Volume
2018-02-21 08:01:00,1744.2,1746.0,1738.6,1738.6,34727
2018-02-21 08:02:00,1738.8,1743.0,1738.8,1740.0,6483
2018-02-21 08:03:00,1739.6,1739.6,1737.8,1738.2,6622'
然后您的原始代码将会起作用:
In[12]:
pd.read_csv(io.StringIO(df.to_csv(index_label='datetime')), index_col='datetime')
Out[12]:
Open High Low Close Volume
datetime
2018-02-21 08:01:00 1744.2 1746.0 1738.6 1738.6 34727
2018-02-21 08:02:00 1738.8 1743.0 1738.8 1740.0 6483
2018-02-21 08:03:00 1739.6 1739.6 1737.8 1738.2 6622