Question

我对python中的指标有一些疑问。我有下一个错误： “ValueError：无法处理多类和连续的混合”。

我的代码看起来像这里（附加参数的全部信息）：

X_train,X_test,y_train,y_test = cross_validation.train_test_split(data, target, test_size=0.3, random_state=42)
clf = RFC()
clf = clf.fit(X_train,y_train)
y_predict = clf.predict_proba(X_test)[:,1]
print f1_score(y_test,y_predict)

>>>X_train.shape
(7000, 576)
>>>X_test.shape
(3000, 576)
>>>y_train.shape
(7000,)
>>>y_test.shape
(3000,)
>>>X_train.dtype
dtype('float64')
>>>X_test.dtype
dtype('float64')
>>>y_train.dtype
dtype('float64')
>>>y_test.dtype
dtype('float64')
>>>y_predict.shape
(3000,)
>>>y_predict.dtype
dtype('float64')

我认为，有些参数是错误的，但首先看一切都很好......无法真正检查，哪里有问题......

Answer 1

这是问题所在：

y_predict = clf.predict_proba(X_test)[:,1]
print f1_score(y_test,y_predict)

F1是在标签上定义的，而非概率分布，因此请使用predict代替predict_proba。

python代码中的F1_score

1 个答案: