eli5:具有两个标签的show_weights()

时间:2018-08-02 17:48:59

标签: scikit-learn nlp regression

我正在尝试eli5,以便了解术语对某些类别的预测的贡献。

您可以运行以下脚本:

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.datasets import fetch_20newsgroups

#categories = ['alt.atheism', 'soc.religion.christian']
categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics']

np.random.seed(1)
train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=7)
test = fetch_20newsgroups(subset='test', categories=categories, shuffle=True, random_state=7)

bow_model = CountVectorizer(stop_words='english')
clf = LogisticRegression()
pipel = Pipeline([('bow', bow),
                 ('classifier', clf)])

pipel.fit(train.data, train.target)

import eli5
eli5.show_weights(clf, vec=bow, top=20)

问题:

不幸的是,当使用两个标签时,输出仅限于一个表:

categories = ['alt.atheism', 'soc.religion.christian']

Image 1

但是,当使用三个标签时,它也会输出三个表。

categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics']

enter image description here

是软件中的错误,它在第一个输出中错过了y = 0还是我错过了一个统计点?我希望在第一种情况下可以看到两个表格。

>

1 个答案:

答案 0 :(得分:1)

这与eli5没有关系,但与scikit-learn(在这种情况下为LogisticRegression())如何对待两种类别有关。对于只有两个类别,问题变成了二进制类别,因此从学习到的分类器中到处都只返回一列属性。

查看LogisticRegression的属性:

  

coef_:数组,形状为(1,n_features)或(n_classes,n_features)

Coefficient of the features in the decision function.
coef_ is of shape (1, n_features) when the given problem is binary.
     

intercept_:数组,形状为(1,)或(n_classes,)

Intercept (a.k.a. bias) added to the decision function.

If fit_intercept is set to False, the intercept is set to zero.
intercept_ is of shape(1,) when the problem is binary.

coef_的格式为(1, n_features)(二进制)。 coef_使用此eli5.show_weights()

希望这很清楚。