Question

在scikit中学习svm分类器class_weight = None和class_weight = Auto之间的区别是什么。

从文档中可以看出

将类i的参数C设置为SVC的class_weight [i] * C.如果没有给出，所有课程都应该有一个重量。 “自动”模式使用y的值自动调整与类频率成反比的权重。

class sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma=0.0, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, random_state=None)

但使用自动模式有什么好处。我无法理解它的实现。

Answer 1

这发生在class_weight.py file：

中

elif class_weight == 'auto':
    # Find the weight of each class as present in y.
    le = LabelEncoder()
    y_ind = le.fit_transform(y)
    if not all(np.in1d(classes, le.classes_)):
        raise ValueError("classes should have valid labels that are in y")

    # inversely proportional to the number of samples in the class
    recip_freq = 1. / bincount(y_ind)
    weight = recip_freq[le.transform(classes)] / np.mean(recip_freq)

这意味着您拥有的每个班级（classes）的权重等于1除以该班级在您的数据中显示的次数（y），所以班级看起来更频繁会降低权重。然后将其进一步除以所有逆类频率的平均值。

优势在于您不再需要担心自己设置类权重：这应该对大多数应用程序都有好处。

如果你在源代码中查看，None weight填充了{{1}}，那么每个类的权重都相同。

Answer 2

这是一篇相当古老的帖子，但是对于那些刚刚遇到此问题的人，请注意，自版本0.17起，class_weight =='auto'已被弃用。请改用class_weight =='balanced'。

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

实现如下：

n_samples /（n_classes * np.bincount（y））

干杯！

在svm scikit中，class weight = none和auto之间的区别是什么？

2 个答案: