Question

我的MultinomialNB分类器在矢量化的假/真实新闻文章中被实例化和训练，现在我试图理解系数背后的含义。

nb_classifier = MultinomialNB()

# Extracting the class labels: ('Fake' or 'Real')
class_labels = nb_classifier.classes_

# Extract the features_names from the vectorizer I used
feature_names = count_vectorizer.get_feature_names()

# Zip the feature_names together with the coefficient array and sort by weights
feat_with_weights = sorted(zip(nb_classifier.coef_[0], feature_names))

print(class_labels[0], feat_with_weights[-20:]) #Or, class_labels[1] = 'Real'

结果：

FAKE [（-6.2632792078858461，＆＃39; sanders＆＃39;），（ - 6.2426599206831099，＆＃39; house＆＃39;），（ - 6.1832365002123097，＆＃39; senate＆＃39;），（ - 6.1641883052416144，＆＃39; time＆＃39;），（ - 6.191285280585872，＆＃39; iraq＆＃39;），（ - 5.9297875994027711，＆＃39;共和主义者＆＃39），...]

我知道较高的系数（-5.9）意味着令牌具有比-6.2更高的预测性。 但我不确定这种关系在哪里。这是否意味着令牌共和党＆＃39;与假新闻或真实新闻密切相关。

系数在多项式NB中的作用

0 个答案: