识别sklearn模型的类

时间:2015-05-12 10:26:28

标签: python scikit-learn svm

SVMs上的文档意味着存在一个名为classes_的属性,据称该属性揭示了模型如何在内部表示类。

我想获取这些信息,以便解释像predict_proba这样的函数的输出,这些函数为多个样本生成类的概率。希望知道给出一些说明值:

model.classes_ 
>>> [1, 2, 4]

意味着我可以认为这有:

model.predict_proba([[1.2312, 0.23512, 6.01234], [3.7655, 8.2353, 0.86323]]) 
>>> [[0.032, 0.143, 0.825], [0.325, 0.143, 0.532]]

概率应转换为与类相同的顺序,即我可以假设的第一组特征:

probability of class 1: 0.032
probability of class 2: 0.143
probability of class 4: 0.825

但是在SVM上调用classes_会导致错误。有没有一个获得这些信息的好方法?在模型训练之后,我无法想象它再也无法访问了。

编辑: 我构建模型的方式或多或少是这样的:

from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV
from sklearn.pipeline import Pipeline, FeatureUnion


pipeline = Pipeline([
   ('features', FeatureUnion(transformer_list[ ... ])),
   ('svm', SVC(probability=True))
])
parameters = { ... }
grid_search = GridSearchCV(
    pipeline,
    parameters
)

grid_search.fit(get_data(), get_labels())
clf = [elem for elem in grid_search.estimator.steps if elem[0] == 'svm'][0][1]

print(clf)
>> SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=True, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
print(clf.classes_)
>> Traceback (most recent call last):
  File "path/to/script.py", line 284, in <module>
  File "path/to/script.py", line 181, in re_train
    print(clf.classes_)
AttributeError: 'SVC' object has no attribute 'classes_'

3 个答案:

答案 0 :(得分:3)

The http://localhost:8080/messenger/ that you are looking at it the unfitted pipeline. The grid_search.estimator attribute only exists after fitting, as the classifier needs to have seen classes_.

What you want it the estimator that was trained using the best parameter settings, which is y.

The following will work:

grid_search.best_estimator_

[and classes_ does exactly what you think it does].

答案 1 :(得分:1)

sklearn中有一个类字段,可能意味着您调用了错误的模型,请参阅下面的示例,我们可以看到在查看classes_字段时有类:

>>> import numpy as np
>>> from sklearn.svm import SVC
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> clf = SVC(probability=True)
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=True, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
>>> print clf.classes_
[1 2]
>>> print clf.predict([[-0.8, -1]])
[1]
>>> print clf.predict_proba([[-0.8, -1]])
[[ 0.92419129  0.07580871]]

答案 2 :(得分:0)

我相信这应该可以解决问题

arr = model.predict_proba(X)

list1 = arr.tolist()

cls = model.classes_

list2 = cls.tolist()

d = {''Category'':list2,''Probability'':list1[0]}

df = pd.DataFrame(d)

print(df)