python错误无法处理多类和连续多输出的混合

时间:2016-11-19 10:53:30

标签: python numpy scikit-learn multilabel-classification

我得到这个错误“当我试图获得我的模型的准确性时,无法处理多类和连续多输出的混合”。我一直在试图弄清楚什么是布莱恩,但我不知道并且我对错误感到困惑。

# TRAINING data
#Convert crime labels to numbers
df_crime = preprocessing.LabelEncoder()
crime = df_crime.fit_transform(train.Category)
#Get binarized weekdays, districts, and hours using dummy variables
days = pd.get_dummies(train.DayOfWeek)
district = pd.get_dummies(train.PdDistrict)
hour = train.Dates.dt.hour
hour = pd.get_dummies(hour)
#Build new array
train_data = pd.concat([hour, days, district], axis=1)
train_data['crime']=crime
#train_data.head()

#Repeat for test data
days = pd.get_dummies(test.DayOfWeek)
district = pd.get_dummies(test.PdDistrict)

hour = test.Dates.dt.hour
hour = pd.get_dummies(hour) 

test_data = pd.concat([hour, days, district], axis=1)

features = ['Friday', 'Monday', 'Saturday', 'Sunday', 'Thursday', 'Tuesday',
 'Wednesday', 'BAYVIEW', 'CENTRAL', 'INGLESIDE', 'MISSION',
 'NORTHERN', 'PARK', 'RICHMOND', 'SOUTHERN', 'TARAVAL', 'TENDERLOIN']

training, testing = train_test_split(train_data, train_size=.60) 

#bernoulliNB
# predicting only on the training data
model_B = BernoulliNB()
model_B.fit(training[features], training['crime'])
predicted2 = np.array(model_B.predict_proba(testing[features]))
log_loss(testing['crime'], predicted2)

score_b = accuracy_score(testing['crime'], predicted2)
print(score_b)
ValueError                                Traceback (most recent call last)
<ipython-input-27-7d9db3ef89cc> in <module>()
----> 1 score_b = accuracy_score(testing['crime'], predicted2)
      2 
      3 print(score_b)

C:\Users\Michael\Anaconda3\lib\site-packages\sklearn\metrics\classification.py in accuracy_score(y_true, y_pred, normalize, sample_weight)
    170 
    171     # Compute accuracy for each possible representation
--> 172     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    173     if y_type.startswith('multilabel'):
    174         differing_labels = count_nonzero(y_true - y_pred, axis=1)

C:\Users\Michael\Anaconda3\lib\site-packages\sklearn\metrics\classification.py in _check_targets(y_true, y_pred)
     80     if len(y_type) > 1:
     81         raise ValueError("Can't handle mix of {0} and {1}"
---> 82                          "".format(type_true, type_pred))
     83 
     84     # We can't have more than one value on y_type => The set is no more needed

ValueError: Can't handle mix of multiclass and continuous-multioutput

1 个答案:

答案 0 :(得分:2)

predicted2是一个类概率数组(.predict_proba(X)结果); accuracy_score仅采用顶级类(predict(X)结果)。这意味着这应该有效:

predicted3 = model_B.predict(testing[features])
accuracy_score(testing['crime'], predicted3)

但两次调用predict / predict_proba并不是一个好主意:它效率低下,如果由于某种原因预测是非确定性的,你可以得到不匹配的分数。所以做这样的事情会更好:

accuracy_score(testing['crime'], predicted2.argmax(axis=1))