我是python的新用户,我正在尝试训练决策树分类器,但是每次都遇到错误。我尝试了不同的方法,但没有一个能用,我相信问题出在我多次更改的数据类型或表示形式上,但仍然有错误。
我的问题是目标是字符串,我尝试使用 MultiLabelBinarizer和preprocessing.LabelEncoder() 但没有一个能起作用
# import arff data using panda
data = arff.loadarff('Run1/Tr.arff')
df = pd.DataFrame(data[0])
# set the data and target
data = pd.DataFrame(df)
data = data.loc[:,'ATT1':'ATT576']
target = df['Class']
#split the data into training and testing
data_train, data_test, target_train, target_test = train_test_split(data, target, test_size=0.30, random_state=0)
# Create adaboost classifer object
abc = AdaBoostClassifier(n_estimators=50, learning_rate=1)
# Train Adaboost Classifer ( I got the error here)
model = abc.fit(data_train,target_train)
-
ValueError Traceback (most recent call last)
<ipython-input-6-2294376ac000> in <module>
1 # Train Adaboost Classifer
----> 2 model = abc.fit(data_train,target_train)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/ensemble/weight_boosting.py in fit(self, X, y, sample_weight)
425
426 # Fit
--> 427 return super().fit(X, y, sample_weight)
428
429 def _validate_estimator(self):
enter code here
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/ensemble/weight_boosting.py in fit(self, X, y, sample_weight)
148 X, y,
149 sample_weight,
--> 150 random_state)
151
152 # Early termination
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/ensemble/weight_boosting.py in _boost(self, iboost, X, y, sample_weight, random_state)
484 """
485 if self.algorithm == 'SAMME.R':
--> 486 return self._boost_real(iboost, X, y, sample_weight, random_state)
487
488 else: # elif self.algorithm == "SAMME":
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/ensemble/weight_boosting.py in _boost_real(self, iboost, X, y, sample_weight, random_state)
494 estimator = self._make_estimator(random_state=random_state)
495
--> 496 estimator.fit(X, y, sample_weight=sample_weight)
497
498 y_predict_proba = estimator.predict_proba(X)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
814 sample_weight=sample_weight,
815 check_input=check_input,
--> 816 X_idx_sorted=X_idx_sorted)
817 return self
818
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
152
153 if is_classification:
--> 154 check_classification_targets(y)
155 y = np.copy(y)
156
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
167 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
168 'multilabel-indicator', 'multilabel-sequences']:
--> 169 raise ValueError("Unknown label type: %r" % y_type)
170
171
ValueError: Unknown label type: 'unknown'
答案 0 :(得分:0)
您收到此错误,因为目标的'dtype'是对象,通过
df.info()
如果label列的所有元素都是相同的数据类型,则很好,但是如果某些元素为str而有些为int,则会显示此错误
要消除该错误,请再添加一行
target = df['Class']
target=target.astype(str)
data_train, data_test, target_train, target_test = train_test_split(data, target, test_size=0.30, random_state=0)