我有带有2000个不同标签的多类标签文本分类问题。使用带有手套嵌入的LSTM进行分类。
le = LabelEncoder()
le.fit(y)
train_y = le.transform(y_train)
test_y = le.transform(y_test)
np.random.seed(seed)
K.clear_session()
model = Sequential()
model.add(Embedding(max_features, embed_dim, input_length = X_train.shape[1],
weights=[embedding_matrix]))#,trainable=False
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(num_classes, activation='softmax'))
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy')
print(model.summary())
我的错误指标是F1得分。我为错误度量构建以下函数
class Metrics(Callback):
def on_train_begin(self, logs={}):
self.val_f1s = []
self.val_recalls = []
self.val_precisions = []
def on_epoch_end(self, epoch, logs={}):
val_predict = (np.asarray(self.model.predict(self.validation_data[0]))).round()
val_targ = self.validation_data[1]
_val_f1 = f1_score(val_targ, val_predict)
_val_recall = recall_score(val_targ, val_predict)
_val_precision = precision_score(val_targ, val_predict)
self.val_f1s.append(_val_f1)
self.val_recalls.append(_val_recall)
self.val_precisions.append(_val_precision)
print("— val_f1: %f — val_precision: %f — val_recall %f" % (_val_f1, _val_precision, _val_recall))
return
metrics = Metrics()
model.fit(X_train, train_y, validation_data=(X_test, test_y),epochs=10, batch_size=64, callbacks=[metrics])
在第一个时期后出现以下错误:
ValueError: Classification metrics can't handle a mix of multiclass and continuous-multioutput targets
您能告诉我我的代码在哪里犯了错误吗?
答案 0 :(得分:0)
F1得分,召回率和精度是二进制分类的指标,要在多类/多标签问题中使用它,您需要在函数f1_score
,recall_score
和precision_score
中添加参数。
尝试一下:
_val_f1 = f1_score(val_targ, val_predict, average='weighted')
_val_recall = recall_score(val_targ, val_predict, average='weighted')
_val_precision = precision_score(val_targ, val_predict, average='weighted')
在此处找到有关平均参数的更多信息:
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html
答案 1 :(得分:0)
您的问题是由于此代码行中val_predict中存在连续值引起的
_val_f1 = f1_score(val_targ, val_predict)
在计算f1_score之前,您应该在val_predict中四舍五入。
示例解决方案:
_val_f1 = f1_score(val_targ,np.round(val_predict))
要提及的是:如果要更改舍入函数的阈值(默认值为0.5),可以在[0,1]间隔内添加或减去值:
>>> a = np.arange(0,1,0.1)
>>> print(a, abs(np.round(a-0.1)), sep='\n')
>>> print(a, abs(np.round(a+0.3)), sep='\n')
[0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
array([0. 0. 0. 0. 0. 0. 1. 1. 1. 1.])
[0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
array([0., 0., 0., 1., 1., 1., 1., 1., 1., 1.])
希望有帮助!