FastText精确度和召回权衡

时间:2017-08-16 12:35:04

标签: nlp word2vec text-analysis word-embedding fasttext

在FastText中,我想改变精确度和召回之间的平衡。可以吗?

1 个答案:

答案 0 :(得分:0)

如果您指的是 python fasttext 实现,恐怕没有内置的简单方法可以做到这一点,您可以做的是查看返回的概率并调用您的 AUC 或 ROC 曲线图方法选择概率列表,这里是一个代码示例,它只针对二元分类器执行此操作:

# label the data
labels, probabilities = fasttext_classifier.predict([re.sub('\n', ' ', sentence) 
                                                     for sentence in test_sentences])

# convert fasttext multilabel results to a binary classifier (probability of TRUE)
labels = list(map(lambda x: x == ['__label__TRUE'], labels))
probabilities = [probability[0] if label else (1-probability[0]) 
                 for label, probability in zip(labels, probabilities)]

然后您可以使用常见的 sklearn 方法自由构建您的指标:

from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import f1_score
from sklearn.metrics import auc
from matplotlib import pyplot

auc = roc_auc_score(testy, probabilities)
print('ROC AUC=%.3f' % (auc))

# calculate roc curve
fpr, tpr, _ = roc_curve(testy, probabilities)

# plot the roc curve for the model
pyplot.plot(fpr, tpr, marker='.', label='ROC curve')
# axis labels
pyplot.xlabel('False Positive Rate (sensitivity)')
pyplot.ylabel('True Positive Rate (specificity)')
# show the legend
pyplot.legend()
# show the plot
pyplot.show()

precision_values, recall_values, _ = precision_recall_curve(testy, probabilities)
f1 = f1_score(testy, labels)
# summarize scores
print('f1=%.3f auc=%.3f' % (f1, auc))
# plot the precision-recall curves
pyplot.plot(recall_values, precision_values, marker='.', label='Precision,Recall')
# axis labels
pyplot.xlabel('Recall')
pyplot.ylabel('Precision')
# show the legend
pyplot.legend()
# show the plot
pyplot.show()

命令行 fasttext 版本有一个阈值参数,您可以使用不同的阈值执行多次运行,但这是不必要的耗时。