我在列表中有一个column_names,我希望从列表中的列中对One-Hot编码值。我想从数据集中编码分类变量。我尝试了几个程序,但它给我一个错误
from sklearn import preprocessing
#training_set_ed is where my .csv file is stored
edited_training_set = 'edited_dataset/test_set.csv'
trainig_set_ed = pd.read_csv(edited_training_set)
column_header = ['cat_var_1','cat_var_2','cat_var_3','cat_var_4','cat_var_5','cat_var_6',
'cat_var_7','cat_var_8','cat_var_9','cat_var_10','cat_var_11','cat_var_12','cat_var_13',
'cat_var_14','cat_var_15','cat_var_16','cat_var_17','cat_var_18']
clfs = {c:LabelEncoder() for c in column_header}
for col,clf in clfs.items():
trainig_set_ed[col] = clfs[col].fit_transform(trainig_set_ed[col])
trainig_set_ed.to_csv('edited_dataset/train_set_encode.csv',sep='\t',encoding='utf-8')
错误它抛出
追踪(最近一次通话): 文件“preprocessing.py”,第83行,in trainig_set_ed [col] = clfs [col] .fit_transform(trainig_set_ed [col]) 文件“/root/.local/lib/python2.7/site-packages/pandas/core/frame.py”,第2139行, getitem return self._getitem_column(key) _getitem_column中的文件“/root/.local/lib/python2.7/site-packages/pandas/core/frame.py”,第2146行 return self._get_item_cache(key) 在_get_item_cache中输入文件“/root/.local/lib/python2.7/site-packages/pandas/core/generic.py”,第1842行 values = self._data.get(item) 文件“/root/.local/lib/python2.7/site-packages/pandas/core/internals.py”,第3838行,获取 loc = self.items.get_loc(item) get_loc中的文件“/root/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py”,第2524行 return self._engine.get_loc(self._maybe_cast_indexer(key)) pandas._libs.index.IndexEngine.get_loc中的文件“pandas / _libs / index.pyx”,第117行 pandas._libs.index.IndexEngine.get_loc中的文件“pandas / _libs / index.pyx”,第139行 在pandas._libs.hashtable.PyObjectHashTable.get_item中输入文件“pandas / _libs / hashtable_class_helper.pxi”,第1265行 在pandas._libs.hashtable.PyObjectHashTable.get_item中的文件“pandas / _libs / hashtable_class_helper.pxi”,第1273行 KeyError:'cat_var_6'
谢谢!
答案 0 :(得分:3)
演示:
来源DF:
In [93]: df
Out[93]:
a b c
0 aaa xxx ddd
1 bbb zzz bbb
2 ccc aaa aaa
解决方案:
In [94]: from sklearn.preprocessing import LabelEncoder
...:
...: cols = ['a','b','c']
...: clfs = {c:LabelEncoder() for c in cols}
...:
In [95]: for col, clf in clfs.items():
...: df[col] = clfs[col].fit_transform(df[col])
...:
In [96]: df
Out[96]:
a b c
0 0 1 2
1 1 2 1
2 2 0 0
逆转换:
In [97]: clfs['a'].inverse_transform(df['a'])
Out[97]: array(['aaa', 'bbb', 'ccc'], dtype=object)
In [98]: clfs['b'].inverse_transform(df['b'])
Out[98]: array(['xxx', 'zzz', 'aaa'], dtype=object)
In [99]: clfs['c'].inverse_transform(df['c'])
Out[99]: array(['ddd', 'bbb', 'aaa'], dtype=object)