我正在建立一个神经网络模型。我正在使用Jupyter Notebook,并且已经导入了必要的库。有两个数据集,并且合并为一个。合并后,当我运行此代码时,将显示KeyError:Index([])错误消息。您能帮我解决问题吗?
代码:
merge_vector = ["school","sex","age","address",
"famsize","Pstatus","Medu","Fedu",
"Mjob","Fjob","reason","nursery","internet"]
duplicated_mask = merged_df.duplicated(keep=False, subset=merge_vector)
错误消息:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-40-4f1a3ab8858b> in <module>()
----> 1 duplicated_mask = merged_df.duplicated(keep=False, subset=merge_vector)
E:\Anaconda2\envs\tensorflow\lib\site-packages\pandas\core\frame.py in duplicated(self, subset, keep)
4379 diff = Index(subset).difference(self.columns)
4380 if not diff.empty:
-> 4381 raise KeyError(diff)
4382
4383 vals = (col.values for name, col in self.iteritems()
KeyError: Index(['Fedu', 'Fjob', 'Medu', 'Mjob', 'Pstatus', 'address', 'age', 'famsize',
'internet', 'nursery', 'reason', 'school', 'sex'],
dtype='object')
为NN模型导入的库
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from math import floor, ceil
from pylab import rcParams
%matplotlib inline
答案 0 :(得分:2)
您需要intersection
列名和merge_vector
,因为在DataFrame中某些列不存在:
merge_vector = ["school","sex","age","address",
"famsize","Pstatus","Medu","Fedu",
"Mjob","Fjob","reason","nursery","internet"]
merged_df = pd.DataFrame({'internet':[4,5,5],
'school':[7,8,8],
'new':[1,2,3]})
print (merged_df)
internet school new
0 4 7 1
1 5 8 2
2 5 8 3
existed_cols = merged_df.columns.intersection(merge_vector)
print (existed_cols)
Index(['internet', 'school'], dtype='object')
duplicated_mask = merged_df.duplicated(keep=False, subset=existed_cols)
print (duplicated_mask)
0 False
1 True
2 True
dtype: bool