Question

我已经基于site编写了代码，并做了不同的多标签分类器。

我想基于每个班级的准确性和每个班级的F1测量来评估我的模型。

问题是我在所有模型中获得的精度和f1测量值都相同。

我怀疑我做错了什么。我想知道在什么情况下会发生这种情况。

代码与站点完全相同，我这样计算出f1度量值：

print('Logistic Test accuracy is {} '.format(accuracy_score(test[category], prediction)))
    print 'Logistic f1 measurement is {} '.format(f1_score(test[category], prediction, average='micro'))

更新1

这是整个代码，

df = pd.read_csv("finalupdatedothers.csv")
categories = ['ADR','WD','EF','INF','SSI','DI','others']

train,test = train_test_split(df,random_state=42,test_size=0.3,shuffle=True)
X_train = train.sentences
X_test = test.sentences

NB_pipeline = Pipeline([('tfidf', TfidfVectorizer(stop_words=stop_words)),
                        ('clf',OneVsRestClassifier(MultinomialNB(fit_prior=True,class_prior=None))),])
for category in categories:
    print 'processing {} '.format(category)
    NB_pipeline.fit(X_train,train[category])
    prediction = NB_pipeline.predict(X_test)
    print 'NB test accuracy is {} '.format(accuracy_score(test[category],prediction))
    print 'NB f1 measurement is {} '.format(f1_score(test[category],prediction,average='micro'))
    print "\n"

这是输出：

processing ADR 
NB test accuracy is 0.821963394343 
NB f1 measurement is 0.821963394343

这就是我的数据的样子：

,sentences,ADR,WD,EF,INF,SSI,DI,others
0,"extreme weight gain, short-term memory loss, hair loss.",1,0,0,0,0,0,0
1,I am detoxing from Lexapro now.,0,0,0,0,0,0,1
2,I slowly cut my dosage over several months and took vitamin supplements to help.,0,0,0,0,0,0,1
3,I am now 10 days completely off and OMG is it rough.,0,0,0,0,0,0,1
4,"I have flu-like symptoms, dizziness, major mood swings, lots of anxiety, tiredness.",0,1,0,0,0,0,0
5,I have no idea when this will end.,1,0,0,0,0,0,1

为什么我得到相同的号码？

谢谢。

Answer 1

这样做：

for category in categories:
...
...

您实际上是在将问题从多标签转换为二进制。如果要继续执行此操作，则不需要OneVsRestClassifier。您可以直接使用MultinomialNB。否则，您可以直接使用OneVsRestClassifier进行此操作：

# Send all labels at once.
NB_pipeline.fit(X_train,train[categories])
prediction = NB_pipeline.predict(X_test)
print 'NB test accuracy is {} '.format(accuracy_score(test[categories],prediction))
print 'NB f1 measurement is {} '.format(f1_score(test[categories],prediction, average='micro'))

它可能会对所有训练数据中存在的某些标签发出警告，但这是因为您发布的样本数据太小了。

@ user2906838，您对分数是正确的。当average='micro'时，产生的结果将相等。这是mentioned in documentation here：

请注意，对于所有包含的标签将产生相同的精度，召回率和F，

它在那儿写的是关于多类的，但是我怀疑它对于二进制也是一样的。参见以下类似问题，用户已在其中手动计算了所有分数：Multi-class Clasification (multiclassification): Micro-Average Accuracy, Precision, Recall and F Score All Equal

Answer 2

好吧，可能是因为accuracy_score和f1_score返回的分数相同。尽管它们的计算方式之间存在差异，但结果却有所不同。如果您想进一步了解它们的计算方式，请在这里找到答案：How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn?

关于您当前得分相同的问题，请将average的值从micro更改为weighted。这实际上会改变您的分数。正如我在评论中指出的那样。

进行多标签分类时，准确性和F1分数相同

2 个答案: