我得到这个错误“当我试图获得我的模型的准确性时,无法处理多类和连续多输出的混合”。我一直在试图弄清楚什么是布莱恩,但我不知道并且我对错误感到困惑。
# TRAINING data
#Convert crime labels to numbers
df_crime = preprocessing.LabelEncoder()
crime = df_crime.fit_transform(train.Category)
#Get binarized weekdays, districts, and hours using dummy variables
days = pd.get_dummies(train.DayOfWeek)
district = pd.get_dummies(train.PdDistrict)
hour = train.Dates.dt.hour
hour = pd.get_dummies(hour)
#Build new array
train_data = pd.concat([hour, days, district], axis=1)
train_data['crime']=crime
#train_data.head()
#Repeat for test data
days = pd.get_dummies(test.DayOfWeek)
district = pd.get_dummies(test.PdDistrict)
hour = test.Dates.dt.hour
hour = pd.get_dummies(hour)
test_data = pd.concat([hour, days, district], axis=1)
features = ['Friday', 'Monday', 'Saturday', 'Sunday', 'Thursday', 'Tuesday',
'Wednesday', 'BAYVIEW', 'CENTRAL', 'INGLESIDE', 'MISSION',
'NORTHERN', 'PARK', 'RICHMOND', 'SOUTHERN', 'TARAVAL', 'TENDERLOIN']
training, testing = train_test_split(train_data, train_size=.60)
#bernoulliNB
# predicting only on the training data
model_B = BernoulliNB()
model_B.fit(training[features], training['crime'])
predicted2 = np.array(model_B.predict_proba(testing[features]))
log_loss(testing['crime'], predicted2)
score_b = accuracy_score(testing['crime'], predicted2)
print(score_b)
ValueError Traceback (most recent call last)
<ipython-input-27-7d9db3ef89cc> in <module>()
----> 1 score_b = accuracy_score(testing['crime'], predicted2)
2
3 print(score_b)
C:\Users\Michael\Anaconda3\lib\site-packages\sklearn\metrics\classification.py in accuracy_score(y_true, y_pred, normalize, sample_weight)
170
171 # Compute accuracy for each possible representation
--> 172 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
173 if y_type.startswith('multilabel'):
174 differing_labels = count_nonzero(y_true - y_pred, axis=1)
C:\Users\Michael\Anaconda3\lib\site-packages\sklearn\metrics\classification.py in _check_targets(y_true, y_pred)
80 if len(y_type) > 1:
81 raise ValueError("Can't handle mix of {0} and {1}"
---> 82 "".format(type_true, type_pred))
83
84 # We can't have more than one value on y_type => The set is no more needed
ValueError: Can't handle mix of multiclass and continuous-multioutput
答案 0 :(得分:2)
predicted2
是一个类概率数组(.predict_proba(X)
结果); accuracy_score
仅采用顶级类(predict(X)
结果)。这意味着这应该有效:
predicted3 = model_B.predict(testing[features])
accuracy_score(testing['crime'], predicted3)
但两次调用predict / predict_proba并不是一个好主意:它效率低下,如果由于某种原因预测是非确定性的,你可以得到不匹配的分数。所以做这样的事情会更好:
accuracy_score(testing['crime'], predicted2.argmax(axis=1))