Question

我有数据框，并对分类列进行了get_dummies运算，因此为每个分类列的每个类别生成了名称为“ columnName_cellValues”的新列，并建立了模型并保存。

#Load data
df = pd.read_csv('df.csv',sep=',',decimal='.',header=0)
#encode target column
df['class'] = LabelEncoder().fit_transform(df['class'])
#filter categorical column names
cat_columns = df.dtypes[df.dtypes == "object"].index 
#get_dummies on it
df = pd.get_dummies(df, columns=cat_columns, drop_first=True)

Now built the model say randomForest and pickled it

以后

I load the model and got a test data which is only one record, Here categorical columns will have one of the category, so to do the predict

How should I map the column names of the model and the test data? Because here I don't have the training data, I have only model and test data.

示例：训练数据的列“ COLOR”以红色，绿色，蓝色为值，当我们获取虚拟变量时，我们将获得3列，分别为COLOR_red，COLOR_green，COLOR_blue。

现在在测试数据上，如果我有值为“ red”的“ COLOR”列，则需要在test_data中创建一个列为COLOR_red，并将值分配为1，另两个列为零，我应该如何做有多个类别的多个列？

按顺序使用OneHotEncoder

onehotencoder = OneHotEncoder(categorical_features=cat_columns[0],sparse=False)
df = onehotencoder.fit_transform(df)

我正在关注错误

C:\Anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py:392: DeprecationWarning: The 'categorical_features' keyword is deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.
  "use the ColumnTransformer instead.", DeprecationWarning)
Traceback (most recent call last):

  File "<ipython-input-23-b7547a4fe6b8>", line 1, in <module>
    QBE_clean = onehotencoder.fit_transform(df)

  File "C:\Anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py", line 511, in fit_transform
    self._handle_deprecations(X)

  File "C:\Anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py", line 396, in _handle_deprecations
    sel[np.asarray(self.categorical_features)] = True

IndexError: arrays used as indices must be of integer (or boolean) type

如何修复get_dummies从培训到测试数据？

0 个答案: