我很确定我的随机森林模型正在运行。当我查看所做的预测以及测试集中的实际类时,它们的匹配程度非常好。第一部分是我对分类数据进行编码:
Y_train[Y_train == 'Blue'] = 0.0
Y_train[Y_train == 'Green'] = 1.0
Y_test[Y_test == 'Blue'] = 0.0
Y_test[Y_test == 'Green'] = 1.0
rf = RandomForestRegressor(n_estimators=50)
rf.fit(X_train, Y_train)
predictions = rf.predict(X_test)
for i in range(len(predictions)):
predictions[i] = predictions[i].round()
print(predictions)
print(Y_test)
print(confusion_matrix(Y_test, predictions))
运行此代码时,我成功打印了predictions
和Y_test
:
[1. 1. 1. 0. 1. 0. 0. 1. 1. 1. 1. 0. 0. 0. 1. 0. 1. 0. 1. 0. 0. 1. 0. 1.
1. 0. 1. 1. 1. 0. 1. 0. 1. 1. 0. 0. 0. 0. 1. 1. 0. 1. 0. 1. 1. 0. 1. 0.
0. 0. 0. 0. 1. 1. 0. 1. 1. 1. 1. 1. 1. 0. 0. 1. 0. 0. 1. 0. 1. 1. 1. 0.
0. 1. 0. 1. 1. 1. 1. 0. 0. 0. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 1. 1. 0. 1.
0. 0. 0. 0.]
615 1
821 1
874 1
403 0
956 1
..
932 1
449 0
339 0
191 0
361 0
Name: Colour, Length: 100, dtype: object
如您所见,它们完美匹配,因此模型可以正常工作。当我尝试在scikit-learn中使用confusion_matrix()
函数时,我遇到的问题是最后一部分,出现此错误:
Traceback (most recent call last):
File "G:\Work\Colours.py", line 101, in <module>
Main()
File "G:\Work\Colours.py", line 34, in Main
RandForest(X_train, Y_train, X_test, Y_test)
File "G:\Work\Colours.py", line 97, in RandForest
print(confusion_matrix(Y_test, predictions))
File "C:\Users\Me\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\metrics\classification.py", line 253, in confusion_matrix
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "C:\Users\Me\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\metrics\classification.py", line 81, in _check_targets
"and {1} targets".format(type_true, type_pred))
ValueError: Classification metrics can't handle a mix of unknown and binary targets
我该如何对两个数据集进行处理,以使confusion_matrix()
函数不会引发任何类型错误?
编辑-predictions
和Y_test
都是相同的形状,(100,)
答案 0 :(得分:0)
您必须比较具有相同尺寸的矩阵,因此,例如,如果预测包含1列和850行的矩阵,则Y_test必须是1列和850行的矩阵。
print(confusion_matrix(Y_test [1],预测))
答案 1 :(得分:0)
设法通过对此类分类数据进行编码来解决此问题:
for i in range(len(Y_train)):
if Y_train.iloc[i] == 'Blue':
Y_train.iloc[i] = 0.0
else:
Y_train.iloc[i] = 1.0
for i in range(len(Y_test)):
if Y_test.iloc[i] == 'Blue':
Y_test.iloc[i] = 0.0
else:
Y_test.iloc[i] = 1.0
如果有人可以告诉我为什么这可以解决这个对您有所帮助的问题。
编辑-我已经找到了遇到问题的真正原因。我使用的是回归模型而不是分类模型。愚蠢的错误。可以通过使用RandomForestClassifier()
而不是RandomForestRegressor()
来避免这些问题。