通过Fit Models训练模型时如何修复“未知标签类型:'未知'”

时间:2019-07-14 22:35:18

标签: python model jupyter-notebook training-data

我是python的新用户,我正在尝试训练决策树分类器,但是每次都遇到错误。我尝试了不同的方法,但没有一个能用,我相信问题出在我多次更改的数据类型或表示形式上,但仍然有错误。

我的问题是目标是字符串,我尝试使用  MultiLabelBinarizer和preprocessing.LabelEncoder() 但没有一个能起作用

# import arff data using panda
data = arff.loadarff('Run1/Tr.arff')
df = pd.DataFrame(data[0])

# set the data and target
data = pd.DataFrame(df)
data = data.loc[:,'ATT1':'ATT576']
target = df['Class']

#split the data into training and testing
data_train, data_test, target_train, target_test = train_test_split(data, target, test_size=0.30, random_state=0)

# Create adaboost classifer object
abc = AdaBoostClassifier(n_estimators=50, learning_rate=1)

# Train Adaboost Classifer ( I got the error here)
model = abc.fit(data_train,target_train)

这是错误消息

-
ValueError                                Traceback (most recent call last)
<ipython-input-6-2294376ac000> in <module>
      1 # Train Adaboost Classifer
----> 2 model = abc.fit(data_train,target_train)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/ensemble/weight_boosting.py in fit(self, X, y, sample_weight)
    425 
    426         # Fit
--> 427         return super().fit(X, y, sample_weight)
    428 
    429     def _validate_estimator(self):
    enter code here

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/ensemble/weight_boosting.py in fit(self, X, y, sample_weight)
    148                 X, y,
    149                 sample_weight,
--> 150                 random_state)
    151 
    152             # Early termination

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/ensemble/weight_boosting.py in _boost(self, iboost, X, y, sample_weight, random_state)
    484         """
    485         if self.algorithm == 'SAMME.R':
--> 486             return self._boost_real(iboost, X, y, sample_weight, random_state)
    487 
    488         else:  # elif self.algorithm == "SAMME":

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/ensemble/weight_boosting.py in _boost_real(self, iboost, X, y, sample_weight, random_state)
    494         estimator = self._make_estimator(random_state=random_state)
    495 
--> 496         estimator.fit(X, y, sample_weight=sample_weight)
    497 
    498         y_predict_proba = estimator.predict_proba(X)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    814             sample_weight=sample_weight,
    815             check_input=check_input,
--> 816             X_idx_sorted=X_idx_sorted)
    817         return self
    818 

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    152 
    153         if is_classification:
--> 154             check_classification_targets(y)
    155             y = np.copy(y)
    156 

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
    167     if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
    168                       'multilabel-indicator', 'multilabel-sequences']:
--> 169         raise ValueError("Unknown label type: %r" % y_type)
    170 
    171 

ValueError: Unknown label type: 'unknown'

1 个答案:

答案 0 :(得分:0)

您收到此错误,因为目标的'dtype'是对象,通过

    df.info()

如果label列的所有元素都是相同的数据类型,则很好,但是如果某些元素为str而有些为int,则会显示此错误

要消除该错误,请再添加一行

    target = df['Class']
    target=target.astype(str)
    data_train, data_test, target_train, target_test = train_test_split(data, target, test_size=0.30, random_state=0)