ValueError:分类指标不能同时处理未知目标和二进制目标?

时间:2019-12-10 14:28:24

标签: python pandas machine-learning scikit-learn

我很确定我的随机森林模型正在运行。当我查看所做的预测以及测试集中的实际类时,它们的匹配程度非常好。第一部分是我对分类数据进行编码:

Y_train[Y_train == 'Blue'] = 0.0
Y_train[Y_train == 'Green'] = 1.0
Y_test[Y_test == 'Blue'] = 0.0
Y_test[Y_test == 'Green'] = 1.0

rf = RandomForestRegressor(n_estimators=50)
rf.fit(X_train, Y_train)
predictions = rf.predict(X_test)

for i in range(len(predictions)):
    predictions[i] = predictions[i].round()

print(predictions)
print(Y_test)

print(confusion_matrix(Y_test, predictions))

运行此代码时,我成功打印了predictionsY_test

[1. 1. 1. 0. 1. 0. 0. 1. 1. 1. 1. 0. 0. 0. 1. 0. 1. 0. 1. 0. 0. 1. 0. 1.
 1. 0. 1. 1. 1. 0. 1. 0. 1. 1. 0. 0. 0. 0. 1. 1. 0. 1. 0. 1. 1. 0. 1. 0.
 0. 0. 0. 0. 1. 1. 0. 1. 1. 1. 1. 1. 1. 0. 0. 1. 0. 0. 1. 0. 1. 1. 1. 0.
 0. 1. 0. 1. 1. 1. 1. 0. 0. 0. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 1. 1. 0. 1.
 0. 0. 0. 0.]
615    1
821    1
874    1
403    0
956    1
      ..
932    1
449    0
339    0
191    0
361    0
Name: Colour, Length: 100, dtype: object

如您所见,它们完美匹配,因此模型可以正常工作。当我尝试在scikit-learn中使用confusion_matrix()函数时,我遇到的问题是最后一部分,出现此错误:

    Traceback (most recent call last):
  File "G:\Work\Colours.py", line 101, in <module>
    Main()
  File "G:\Work\Colours.py", line 34, in Main
    RandForest(X_train, Y_train, X_test, Y_test)
  File "G:\Work\Colours.py", line 97, in RandForest
    print(confusion_matrix(Y_test, predictions))
  File "C:\Users\Me\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\metrics\classification.py", line 253, in confusion_matrix
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
  File "C:\Users\Me\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\metrics\classification.py", line 81, in _check_targets
    "and {1} targets".format(type_true, type_pred))
ValueError: Classification metrics can't handle a mix of unknown and binary targets

我该如何对两个数据集进行处理,以使confusion_matrix()函数不会引发任何类型错误?

编辑-predictionsY_test都是相同的形状,(100,)

2 个答案:

答案 0 :(得分:0)

您必须比较具有相同尺寸的矩阵,因此,例如,如果预测包含1列和850行的矩阵,则Y_test必须是1列和850行的矩阵。

print(confusion_matrix(Y_test [1],预测))

答案 1 :(得分:0)

设法通过对此类分类数据进行编码来解决此问题:

for i in range(len(Y_train)):
    if Y_train.iloc[i] == 'Blue':
        Y_train.iloc[i] = 0.0
    else:
        Y_train.iloc[i] = 1.0

for i in range(len(Y_test)):
    if Y_test.iloc[i] == 'Blue':
        Y_test.iloc[i] = 0.0
    else:
        Y_test.iloc[i] = 1.0

如果有人可以告诉我为什么这可以解决这个对您有所帮助的问题。

编辑-我已经找到了遇到问题的真正原因。我使用的是回归模型而不是分类模型。愚蠢的错误。可以通过使用RandomForestClassifier()而不是RandomForestRegressor()来避免这些问题。