具有class_weight = auto的SGDClassifier在scikit-learn 0.15但不是0.14时失败

时间:2014-07-17 16:24:23

标签: scikit-learn

当我使用以下选项训练scikit-learn v0.15 SGDClassifierSGDClassifier(loss='log', class_weight=None, penalty='l2')时,训练完成且没有错误。 然而,当我在scikit-learn v0.15上用class_weight='auto'训练这个分类器时,我得到了这个错误:

  return self.model.fit(X, y)
  File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/linear_model/stochastic_gradient.py", line 485, in fit
    sample_weight=sample_weight)
  File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/linear_model/stochastic_gradient.py", line 389, in _fit
    classes, sample_weight, coef_init, intercept_init)
  File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/linear_model/stochastic_gradient.py", line 336, in _partial_fit
    y_ind)
  File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/utils/class_weight.py", line 43, in compute_class_weight
    raise ValueError("classes should have valid labels that are in y")
ValueError: classes should have valid labels that are in y

可能导致什么原因?

供参考,以下是class_weight的文档:

  

class_weight fit参数的预设。与...相关的权重   类。如果没有给出,所有课程都应该有一个重量。   “自动”模式使用y的值自动调整权重   与班级频率成反比。

2 个答案:

答案 0 :(得分:3)

我认为这可能是scikit-learn中的一个错误。作为解决方法,请尝试以下方法:

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_encoded = le.fit_transform(y)
self.model.fit(X, y_encoded)
pred = le.inverse_transform(self.model.predict(X))

答案 1 :(得分:0)

我正在努力解决这个问题:

https://github.com/scikit-learn/scikit-learn/pull/3515

请随时测试并报告是否能为您解决问题。