pandas pd.read_csv KeyError:2

时间:2018-04-02 12:49:48

标签: python pandas

我试图读取我的汽车销售数据并将其转移到numpy阵列。但它不起作用。 这是数据图像。 enter image description here

import numpy as np
import pandas as pd

for i in range(2,34):
    data = pd.read_csv('Book2.csv')[i].values
data.shape

print(data)

错误讯息:

Traceback (most recent call last):
  File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\base.py", line 2525, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "F:\Files\python\neutral_network\2.py", line 5, in <module>
    data = pd.read_csv('Book2.csv')[i].values
  File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\frame.py", line 2139, in __getitem__
    return self._getitem_column(key)
  File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\frame.py", line 2146, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 1842, in _get_item_cache
    values = self._data.get(item)
  File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\internals.py", line 3843, in get
    loc = self.items.get_loc(item)
  File "C:\Users\ThinkPad\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\base.py", line 2527, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 2

2 个答案:

答案 0 :(得分:0)

由于第5行中的索引i,您遇到的错误。 将整个csv转换为numpy ndarray的更好方法如下。

data = pd.read_csv('Book2.csv')
numpyMatrix = data.as_matrix()

您也可以尝试data.values转换为numpy ndarray,但元素类型将是对象。

答案 1 :(得分:0)

正如Prakash所说,问题出在索引变量i第5行.read_csv返回一个pandas数据帧,而Pandas不知道如何处理索引值。

还有另外两个基本问题。首先,每次通过循环重新分配数据时都要重新读取文件,因此即使代码按预期工作,最多也只能得到一列数据。其次,read_csv无法正确解释您的数据。问题是第二个字段中的逗号,pandas最初将解释为分隔符,因此您必须告诉它忽略引号内的逗号。我找到了以下工作(在您的数据的子集上):

In [35]: data2=pd.read_csv("Book2.csv", skipinitialspace=True, quotechar='"')

In [36]: data2
Out[36]:
   Date     H6sv   h6mi  h6shv
0     1  26, 368  17.30  18182
1     2  24, 402  18.00  15030
2     3  24, 451  30.33  11312
3     4  26, 528  60.52   9730

然后删除您不想要的列:

In [55]: data2.drop(columns="Date")
Out[55]:
      H6sv   h6mi  h6shv
0  26, 368  17.30  18182
1  24, 402  18.00  15030
2  24, 451  30.33  11312
3  26, 528  60.52   9730

是的,我花了55次试图得到我想要的东西......