我正在练习将Google财经的股票市场数据导入Pandas DataFrame:
import pandas as pd
from pandas import Series
path = 'http://www.google.com/finance/historical?cid=542029859096076&startdate=Sep+22%2C+2001&enddate=Sep+20%2C+2016&num=30&ei=3HvhV4n3D8XGmAGp4q74Ag&output=csv'
df = pd.read_csv(path)
到目前为止一直很好,而且df也显示了我需要的完整数据集。
但是,在调用特定列时,例如
df['Date']
Python显示以下错误代码:
Traceback (most recent call last):
File "<ipython-input-31-cb486dd31fbc>", line 1, in <module>
df['Date']
File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/core/frame.py", line 1997, in __getitem__
return self._getitem_column(key)
File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/core/frame.py", line 2004, in _getitem_column
return self._get_item_cache(key)
File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/core/generic.py", line 1350, in _get_item_cache
values = self._data.get(item)
File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/core/internals.py", line 3290, in get
loc = self.items.get_loc(item)
File "/Users/Username/anaconda/lib/python3.5/site-packages/pandas/indexes/base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'Date'
另一方面,df [&#39; High&#39;]等其他专栏也没问题。无论如何我可以解决这个问题吗?
答案 0 :(得分:5)
此CSV文件包含BOM (Byte Order Mark) signature,因此请尝试以下方式:
df = pd.read_csv(path, encoding='utf-8-sig')
如何轻松识别此问题(感谢@jezrael's hint):
In [11]: print(df.columns.tolist())
['\ufeffDate', 'Open', 'High', 'Low', 'Close', 'Volume']
并注意第一栏
注意:,@ayhan已注意到,从版本0.19.0开始Pandas will take care of it automatically:
pd.read_csv()中的错误导致BOM文件被错误地解析而不忽略BOM GH4793