我已尝试使用Keras提供的代码,然后才将其删除。这是代码:
def precision(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
def recall(y_true, y_pred):
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
def fbeta_score(y_true, y_pred, beta=1):
if beta < 0:
raise ValueError('The lowest choosable beta is zero (only precision).')
# If there are no true positives, fix the F score at 0 like sklearn.
if K.sum(K.round(K.clip(y_true, 0, 1))) == 0:
return 0
p = precision(y_true, y_pred)
r = recall(y_true, y_pred)
bb = beta ** 2
fbeta_score = (1 + bb) * (p * r) / (bb * p + r + K.epsilon())
return fbeta_score
def fmeasure(y_true, y_pred):
return fbeta_score(y_true, y_pred, beta=1)
从我所看到的(我是一个业余爱好者),似乎他们使用了正确的公式。但是,当我尝试将其用作训练过程中的指标时,我得到了val_accuracy,val_precision,val_recall和val_fmeasure的完全相等的输出。我相信即使公式正确也可能发生,但我相信这不太可能。对此问题的任何解释?谢谢
答案 0 :(得分:47)
因为Keras 2.0指标f1,精度和召回已被删除。解决方案是使用自定义度量函数:
from keras import backend as K
def f1(y_true, y_pred):
def recall(y_true, y_pred):
"""Recall metric.
Only computes a batch-wise average of recall.
Computes the recall, a metric for multi-label classification of
how many relevant items are selected.
"""
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
def precision(y_true, y_pred):
"""Precision metric.
Only computes a batch-wise average of precision.
Computes the precision, a metric for multi-label classification of
how many selected items are relevant.
"""
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
precision = precision(y_true, y_pred)
recall = recall(y_true, y_pred)
return 2*((precision*recall)/(precision+recall+K.epsilon()))
model.compile(loss='binary_crossentropy',
optimizer= "adam",
metrics=[f1])
此功能的返回行
return 2*((precision*recall)/(precision+recall+K.epsilon()))
通过添加常数epsilon来修改,以避免除以0.因此不会计算NaN。
答案 1 :(得分:8)
使用Keras度量函数不是计算F1或AUC或诸如此类的正确方法。
这样做的原因是,在验证时,每个批处理步骤都会调用metric函数。这样,Keras系统将计算批处理结果的平均值。那不是正确的F1得分。
这就是为什么F1分数从keras的度量函数中删除的原因。看到这里:
正确的方法是使用自定义回调函数,如下所示: https://medium.com/@thongonary/how-to-compute-f1-score-for-each-epoch-in-keras-a1acd17715a2
答案 2 :(得分:2)
我还建议这种解决方法
model.fit(nb_epoch=1, ...)
,以利用每个时期后输出的精度/调用指标类似这样的东西:
for mini_batch in range(epochs):
model_hist = model.fit(X_train, Y_train, batch_size=batch_size, epochs=1,
verbose=2, validation_data=(X_val, Y_val))
precision = model_hist.history['val_precision'][0]
recall = model_hist.history['val_recall'][0]
f_score = (2.0 * precision * recall) / (precision + recall)
print 'F1-SCORE {}'.format(f_score)
答案 3 :(得分:0)
正如@Pedia在上面的评论中所说的那样on_epoch_end
,如github.com/fchollet/keras/issues/5400所述是最好的方法。
答案 4 :(得分:0)
这是我使用子类进行的流式自定义f1_score指标。它适用于TensorFlow 2.0 beta,但我还没有在其他版本上尝试过。它所做的是在整个时期内跟踪真实的阳性,预测的阳性以及所有可能的阳性,然后在该时期结束时计算f1分数。我认为其他答案只是给出每批次的f1分数,当我们真的想要所有数据的f1分数时,这并不是最好的指标。
我得到了AurélienGeron新书未经处理的原始副本,该书带有Scikit-Learn和Tensorflow 2.0的动手机器学习,并强烈推荐它。这就是我如何使用子类学习如何使用此f1自定义指标的方法。这是我见过的最全面的TensorFlow书籍。 TensorFlow严重阻碍了学习,这个家伙奠定了编码基础,可以学到很多东西。
仅供参考:在指标中,我必须将括号放在f1_score()中,否则它将不起作用。
pip install tensorflow == 2.0.0-beta1
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow import keras
import numpy as np
def create_f1():
def f1_function(y_true, y_pred):
y_pred_binary = tf.where(y_pred>=0.5, 1., 0.)
tp = tf.reduce_sum(y_true * y_pred_binary)
predicted_positives = tf.reduce_sum(y_pred_binary)
possible_positives = tf.reduce_sum(y_true)
return tp, predicted_positives, possible_positives
return f1_function
class F1_score(keras.metrics.Metric):
def __init__(self, **kwargs):
super().__init__(**kwargs) # handles base args (e.g., dtype)
self.f1_function = create_f1()
self.tp_count = self.add_weight("tp_count", initializer="zeros")
self.all_predicted_positives = self.add_weight('all_predicted_positives', initializer='zeros')
self.all_possible_positives = self.add_weight('all_possible_positives', initializer='zeros')
def update_state(self, y_true, y_pred,sample_weight=None):
tp, predicted_positives, possible_positives = self.f1_function(y_true, y_pred)
self.tp_count.assign_add(tp)
self.all_predicted_positives.assign_add(predicted_positives)
self.all_possible_positives.assign_add(possible_positives)
def result(self):
precision = self.tp_count / self.all_predicted_positives
recall = self.tp_count / self.all_possible_positives
f1 = 2*(precision*recall)/(precision+recall)
return f1
X = np.random.random(size=(1000, 10))
Y = np.random.randint(0, 2, size=(1000,))
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)
model = keras.models.Sequential([
keras.layers.Dense(5, input_shape=[X.shape[1], ]),
keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer='SGD', metrics=[F1_score()])
history = model.fit(X_train, y_train, epochs=5, validation_data=(X_test, y_test))
答案 5 :(得分:0)
@Diesche提到以这种方式实现f1_score的主要问题是,在每个批处理步骤都调用它,导致结果混乱比其他任何事情都重要。
我一直在为解决这个问题而苦苦挣扎,但最终通过使用回调解决了这个问题:在一个时代结束时,回调会预测数据(在这种情况下,我选择仅将其应用于验证数据)和新的模型参数,并为您提供在整个时期内评估的连贯指标。
我在python3上使用tensorflow-gpu(1.14.0)
from tensorflow.python.keras.models import Sequential, Model
from sklearn.metrics import f1_score
from tensorflow.keras.callbacks import Callback
from tensorflow.python.keras import optimizers
optimizer = optimizers.SGD(lr=0.0001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=optimizer, loss="binary_crossentropy", metrics=['accuracy'])
model.summary()
class Metrics(Callback):
def __init__(self, model, valid_data, true_outputs):
super(Callback, self).__init__()
self.model=model
self.valid_data=valid_data #the validation data I'm getting metrics on
self.true_outputs=true_outputs #the ground truth of my validation data
self.steps=len(self.valid_data)
def on_epoch_end(self, args,*kwargs):
gen=generator(self.valid_data) #generator yielding the validation data
val_predict = (np.asarray(self.model.predict(gen, batch_size=1, verbose=0, steps=self.steps)))
"""
The function from_proba_to_output is used to transform probabilities
into an understandable format by sklearn's f1_score function
"""
val_predict=from_proba_to_output(val_predict, 0.5)
_val_f1 = f1_score(self.true_outputs, val_predict)
print ("val_f1: ", _val_f1, " val_precision: ", _val_precision, " _val_recall: ", _val_recall)
函数from_proba_to_output
如下:
def from_proba_to_output(probabilities, threshold):
outputs = np.copy(probabilities)
for i in range(len(outputs)):
if (float(outputs[i])) > threshold:
outputs[i] = int(1)
else:
outputs[i] = int(0)
return np.array(outputs)
然后我通过在fit_generator的回调部分中引用此度量标准类来训练我的模型。我没有详细介绍train_generator和valid_generator的实现,因为这些数据生成器特定于手头的分类问题,发布它们只会带来混乱。
model.fit_generator(
train_generator, epochs=nbr_epochs, verbose=1, validation_data=valid_generator, callbacks=[Metrics(model, valid_data)])