如何修复Jupyter Notebook中的``关键错误:索引''错误

时间:2019-01-18 10:11:36

标签: python pandas dataframe jupyter-notebook

我正在建立一个神经网络模型。我正在使用Jupyter Notebook,并且已经导入了必要的库。有两个数据集,并且合并为一个。合并后,当我运行此代码时,将显示KeyError:Index([])错误消息。您能帮我解决问题吗?

代码:

merge_vector = ["school","sex","age","address",
                "famsize","Pstatus","Medu","Fedu",
                "Mjob","Fjob","reason","nursery","internet"]

duplicated_mask = merged_df.duplicated(keep=False, subset=merge_vector)

错误消息:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-40-4f1a3ab8858b> in <module>()
----> 1 duplicated_mask = merged_df.duplicated(keep=False, subset=merge_vector)

E:\Anaconda2\envs\tensorflow\lib\site-packages\pandas\core\frame.py in duplicated(self, subset, keep)
   4379         diff = Index(subset).difference(self.columns)
   4380         if not diff.empty:
-> 4381             raise KeyError(diff)
   4382 
   4383         vals = (col.values for name, col in self.iteritems()

KeyError: Index(['Fedu', 'Fjob', 'Medu', 'Mjob', 'Pstatus', 'address', 'age', 'famsize',
       'internet', 'nursery', 'reason', 'school', 'sex'],
      dtype='object')

为NN模型导入的库

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from math import floor, ceil
from pylab import rcParams

%matplotlib inline

1 个答案:

答案 0 :(得分:2)

您需要intersection列名和merge_vector,因为在DataFrame中某些列不存在:

merge_vector = ["school","sex","age","address",
                "famsize","Pstatus","Medu","Fedu",
                "Mjob","Fjob","reason","nursery","internet"]

merged_df = pd.DataFrame({'internet':[4,5,5],
                          'school':[7,8,8],
                          'new':[1,2,3]})
print (merged_df)
   internet  school  new
0         4       7    1
1         5       8    2
2         5       8    3

existed_cols = merged_df.columns.intersection(merge_vector)
print (existed_cols)
Index(['internet', 'school'], dtype='object')

duplicated_mask = merged_df.duplicated(keep=False, subset=existed_cols)
print (duplicated_mask)
0    False
1     True
2     True
dtype: bool