我的要求是我有两个CSV文件,我需要在两个文件的最后一列上进行比较和执行操作。我正在使用pandas打开两个CSV文件,当我打开第二个CSV文件并尝试访问任何列时 返回错误。
import pandas as pd1
import pandas as pd
# comma delimited is the default
df = pd.read_csv("results.csv", header = 0)
spamColumnValues=df['isSpam'].values
df1=pd1.read_csv("compare.csv",header=0)
spamCompareValues=df1['isSpam'].values
收到错误
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 1964, in __getitem__
return self._getitem_column(key)
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 1971, in _getitem_column
return self._get_item_cache(key)
File "/Library/Python/2.7/site-packages/pandas/core/generic.py", line 1645, in _get_item_cache
values = self._data.get(item)
File "/Library/Python/2.7/site-packages/pandas/core/internals.py", line 3590, in get
loc = self.items.get_loc(item)
File "/Library/Python/2.7/site-packages/pandas/core/indexes/base.py", line 2444, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)
File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)
File "pandas/_libs/hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)
File "pandas/_libs/hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)
KeyError: 'isSpam'
任何人都可以指出我的错误,或者用熊猫做不到这个吗?
可以在
找到两个csv文件https://drive.google.com/file/d/0B3XlF206d5UrUENtZlcwd0pVLW8/view?usp=sharing
https://drive.google.com/file/d/0B3XlF206d5UrbGdJRFM5TURmejQ/view?usp=sharing
答案 0 :(得分:3)
问题是您在compare.csv
中没有名为“isSpam”的列。您需要将header=None
传递给pd.read_csv()
,否则您将以标题捕获第一个观察结果:
df1=pd1.read_csv("compare.csv",header=None)
并且因为列看起来是相同的:
df1.columns = df.columns