我有两个数据框,一个是 all_df ,另一个是 df_good_sample ,当我单独使用one_hot_encoding时,一切都很好。但是当我合并这两个数据帧时,发生了一些错误。
我对one_hot_encoding的实现是:
def one_hot_encoding(register_info, fea):
flag = True
fea_g_id = 1
if flag:
X_df = pd.get_dummies(register_info[fea])
fea_group_ids = [fea_g_id for i in range(X_df.shape[1])]
flag = False
fea_g_id = fea_g_id + 1
else:
X_cur = pd.get_dummies(register_info[fea])
fea_group_ids += [fea_g_id for i in range(X_cur.shape[1])]
fea_g_id = fea_g_id + 1
X_df = pd.concat([X_df,X_cur],axis=1)
X = X_df.values
return X, X_df
当我将它用于 all_df 时,我明白了 one_hot_encoding result for all_df
同样适用于df_good_sample
但是当我使用他们的组合时,我得到了:
NotImplementedError:>目前不支持1 ndim Categorical
详细错误消息:
NotImplementedError Traceback (most recent call last)
<ipython-input-325-54e4d184cdb1> in <module>()
2 record_column_length = []
3 for i in range(0, len(category_feature)):
----> 4 category_df[i] = one_hot_encoding(all_df.append(df_good_sample).replace(0, np.nan), category_feature[i])[1]
5 record_column_length.append(len(category_df[i].columns))
6 concat_group = pd.concat(category_df, ignore_index=True, axis=1)
<ipython-input-312-82e782b3856b> in one_hot_encoding(register_info, fea)
16 # print fea
17 if flag:
---> 18 X_df = pd.get_dummies(register_info[fea])
19 fea_group_ids = [fea_g_id for i in range(X_df.shape[1])]
20 flag = False
/home/ubuntu/app/anaconda2/lib/python2.7/site-packages/pandas/core/reshape/reshape.pyc in get_dummies(data, prefix, prefix_sep, dummy_na, columns, sparse, drop_first)
1211 else:
1212 result = _get_dummies_1d(data, prefix, prefix_sep, dummy_na,
-> 1213 sparse=sparse, drop_first=drop_first)
1214 return result
1215
/home/ubuntu/app/anaconda2/lib/python2.7/site-packages/pandas/core/reshape/reshape.pyc in _get_dummies_1d(data, prefix, prefix_sep, dummy_na, sparse, drop_first)
1218 sparse=False, drop_first=False):
1219 # Series avoids inconsistent NaN handling
-> 1220 codes, levels = _factorize_from_iterable(Series(data))
1221
1222 def get_empty_Frame(data, sparse):
/home/ubuntu/app/anaconda2/lib/python2.7/site-packages/pandas/core/categorical.pyc in _factorize_from_iterable(values)
2142 codes = values.codes
2143 else:
-> 2144 cat = Categorical(values, ordered=True)
2145 categories = cat.categories
2146 codes = cat.codes
/home/ubuntu/app/anaconda2/lib/python2.7/site-packages/pandas/core/categorical.pyc in __init__(self, values, categories, ordered, fastpath)
294
295 # FIXME
--> 296 raise NotImplementedError("> 1 ndim Categorical are not "
297 "supported at this time")
298
NotImplementedError: > 1 ndim Categorical are not supported at this time
希望有人可以帮助我解决这个问题!
答案 0 :(得分:0)
当尝试在名称或记录的实际值中具有Unicode字符的列上获取虚拟变量时,出现此错误。 我切换了列名及其值,它解决了问题:
import pandas as pd
#replace the column names with 'col1', 'col2' and so forth
colnum=1
for colname in list(df):
df.rename(columns={'' + colname + '': 'col' + str(colnum)}, inplace=True)
colnum+=1
#replace the column values with 'val1', 'val2' and so forth:
for colname in list(df):
f_values= df[colname].unique().tolist()
mapping = dict(zip(f_values, ['val' + str(i) for i in range(len(f_values))] ))
df.replace({'' + colname + '': mapping}, inplace=True)
#now running get_dummies will work
df = pd.get_dummies(df)