NotImplementedError:>目前不支持1 ndim Categorical

时间:2018-02-19 16:33:14

标签: python pandas

我有两个数据框,一个是 all_df ,另一个是 df_good_sample ,当我单独使用one_hot_encoding时,一切都很好。但是当我合并这两个数据帧时,发生了一些错误。

我对one_hot_encoding的实现是:

def one_hot_encoding(register_info, fea):
    flag = True
    fea_g_id = 1
    if flag:
        X_df = pd.get_dummies(register_info[fea])
        fea_group_ids = [fea_g_id for i in range(X_df.shape[1])]
        flag = False
        fea_g_id = fea_g_id + 1
    else:
        X_cur = pd.get_dummies(register_info[fea])
        fea_group_ids += [fea_g_id for i in range(X_cur.shape[1])]
        fea_g_id = fea_g_id + 1
        X_df = pd.concat([X_df,X_cur],axis=1)
    X = X_df.values
return X, X_df

当我将它用于 all_df 时,我明白了 one_hot_encoding result for all_df

同样适用于df_good_sample

但是当我使用他们的组合时,我得到了:

NotImplementedError:>目前不支持1 ndim Categorical

详细错误消息:

    NotImplementedError  Traceback (most recent call last)
    <ipython-input-325-54e4d184cdb1> in <module>()
          2 record_column_length = []
          3 for i in range(0, len(category_feature)):
    ----> 4     category_df[i] = one_hot_encoding(all_df.append(df_good_sample).replace(0, np.nan), category_feature[i])[1]
          5     record_column_length.append(len(category_df[i].columns))
          6 concat_group = pd.concat(category_df, ignore_index=True, axis=1)

    <ipython-input-312-82e782b3856b> in one_hot_encoding(register_info, fea)
         16 #     print fea
         17     if flag:
    ---> 18         X_df = pd.get_dummies(register_info[fea])
         19         fea_group_ids = [fea_g_id for i in range(X_df.shape[1])]
         20         flag = False

    /home/ubuntu/app/anaconda2/lib/python2.7/site-packages/pandas/core/reshape/reshape.pyc in get_dummies(data, prefix, prefix_sep, dummy_na, columns, sparse, drop_first)
       1211     else:
       1212         result = _get_dummies_1d(data, prefix, prefix_sep, dummy_na,
    -> 1213                                  sparse=sparse, drop_first=drop_first)
       1214     return result
       1215 

    /home/ubuntu/app/anaconda2/lib/python2.7/site-packages/pandas/core/reshape/reshape.pyc in _get_dummies_1d(data, prefix, prefix_sep, dummy_na, sparse, drop_first)
       1218                     sparse=False, drop_first=False):
       1219     # Series avoids inconsistent NaN handling
    -> 1220     codes, levels = _factorize_from_iterable(Series(data))
       1221 
       1222     def get_empty_Frame(data, sparse):

    /home/ubuntu/app/anaconda2/lib/python2.7/site-packages/pandas/core/categorical.pyc in _factorize_from_iterable(values)
       2142         codes = values.codes
       2143     else:
    -> 2144         cat = Categorical(values, ordered=True)
       2145         categories = cat.categories
       2146         codes = cat.codes

    /home/ubuntu/app/anaconda2/lib/python2.7/site-packages/pandas/core/categorical.pyc in __init__(self, values, categories, ordered, fastpath)
        294 
        295                 # FIXME
    --> 296                 raise NotImplementedError("> 1 ndim Categorical are not "
        297                                           "supported at this time")
        298 

    NotImplementedError: > 1 ndim Categorical are not supported at this time

希望有人可以帮助我解决这个问题!

1 个答案:

答案 0 :(得分:0)

当尝试在名称或记录的实际值中具有Unicode字符的列上获取虚拟变量时,出现此错误。 我切换了列名及其值,它解决了问题:

import pandas as pd
#replace the column names with 'col1', 'col2' and so forth
colnum=1
for colname in list(df):
    df.rename(columns={'' + colname + '': 'col' + str(colnum)}, inplace=True)
    colnum+=1

#replace the column values with 'val1', 'val2' and so forth:
for colname in list(df):
    f_values= df[colname].unique().tolist()
    mapping = dict(zip(f_values,  ['val' + str(i) for i in range(len(f_values))] ))
    df.replace({'' + colname + '': mapping}, inplace=True)

#now running get_dummies will work
df = pd.get_dummies(df)