逆变换预测结果

时间:2018-06-20 18:51:04

标签: python machine-learning scikit-learn sklearn-pandas inverse-transform

我有一个包含三列的培训数据CSV(两列用于数据,第三列用于目标),并且我成功地预测了测试CSV的目标列。问题是我需要将结果逆变换回字符串以进行进一步分析。下面是代码和错误。

from sklearn import datasets
from sklearn import svm
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import LabelEncoder

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections import defaultdict

df_train = pd.read_csv('/Users/justinchristensen/Documents/Python_Education/SKLearn/Path_Training_Data.csv')
df_test = pd.read_csv('/Users/justinchristensen/Documents/Python_Education/SKLearn/Path_Test_Data.csv')

#Separate columns in training data set
x_train = df_train.iloc[:,:-1]
y_train = df_train.iloc[:,-1:]

#Separate columns in test data set
x_test = df_test.iloc[:,:-1]

#Initiate classifier
clf = svm.SVC(gamma=0.001, C=100)
le = LabelEncoder()

#Transform strings into integers
x_train_encoded = x_train.apply(LabelEncoder().fit_transform)
y_train_encoded = y_train.apply(LabelEncoder().fit_transform)
x_test_encoded = x_test.apply(LabelEncoder().fit_transform)

#Fit the model into the classifier
clf.fit(x_train_encoded,y_train_encoded)

#Predict test values
y_pred = clf.predict(x_test_encoded)

错误

NotFittedError
Traceback (most recent call last)
<ipython-input-38-09840b0071d5> in <module>()
      1 
----> 2 y_pred_inverse = le.inverse_transform(y_pred)

~/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/label.py in inverse_transform(self, y)
    146         y : numpy array of shape [n_samples]
    147         """
--> 148         check_is_fitted(self, 'classes_')
    149 
    150         diff = np.setdiff1d(y, np.arange(len(self.classes_)))

~/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
    766 
    767     if not all_or_any([hasattr(estimator, attr) for attr in attributes]):
--> 768         raise NotFittedError(msg % {'name': type(estimator).__name__})
    769 
    770 

NotFittedError: This LabelEncoder instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

1 个答案:

答案 0 :(得分:1)

您需要使用与转换目标时使用的标签对象相同的标签对象才能将其取回。每次使用Label Enocder时,都会实例化一个新对象。使用相同的对象。

更改以下行

y_train_encoded = y_train.apply(le().fit_transform)
y_test_encoded = y_test.apply(le().fit_transform)

然后使用相同的对象反转转换。您也可以检查first example here in the documentation以供参考。