Question

我的数据框中有一个类别变量（A，B，C）。然后，我对其进行了编码（使其成为数字），以便传递到神经网络。

但是，我的最终可视化图向我显示了分类变量的编码值，但我很难将其映射回其原始值。

我使用此命令首先将分类变量（数据类型=对象）编码为数值：

encoders = {}
for x in df.columns:
    if df[x].dtypes=='object':
      le = preprocessing.LabelEncoder()
      df[x]=le.fit_transform(df[x].astype(str))
      encoders[x] = le     

corr = df.corr()

然后，我正在解码使用此代码的代码（在最终可视化之前）：

for x, le in encoders.items():
    df[x] = le.inverse_transform(df[x])

    # Visualization: plotting categorical variables (A,B,C) in scatterplot using Seaborn.
    sns.lmplot(x="A", y="B", data=df, fit_reg=False, hue='C',legend=False)
    display()

...但是可视化效果仍显示编码值，而不是分类值（请参见下面的屏幕截图）。没有映射完成。为什么？

Answer 1

您必须存储原始的LabelEncoder。映射存储在该类中。因此，类似

encoders = {}
for x in df.columns:
    if df[x].dtypes=='object':
       le = preprocessing.LabelEncoder()
       df[x]=le.fit_transform(df[x].astype(str))
       encoders[x] = le

for x, le in encoders.items():
    df[x] = le.inverse_transform(df[x])

甚至更好的是，不要用编码后的标签覆盖标签，而是在数据框中创建一个新列。

无法逆向编码的分类变量

1 个答案: