我对sklearn如何应用我们提供的课程重量感兴趣。 documentation没有明确说明应用类权重的位置和方式。阅读源代码也没有帮助(似乎sklearn.svm.liblinear用于优化,我无法读取源代码,因为它是.pyd文件...)
但我认为它适用于成本函数:当指定类权重时,相应类的成本将乘以类权重。例如,如果我分别从0级(权重= 0.5)和1级(权重= 1)得到2个观察值,那么成本函数将是:
费用= 0.5 * log(... X_0,y_0 ...)+ 1 * log(... X_1,y_1 ......)+惩罚
有谁知道这是否正确?
答案 0 :(得分:1)
检查the following lines in the source code:
le = LabelEncoder()
if isinstance(class_weight, dict) or multi_class == 'multinomial':
class_weight_ = compute_class_weight(class_weight, classes, y)
sample_weight *= class_weight_[le.fit_transform(y)]
Here is the source code for the compute_class_weight()
function:
...
else:
# user-defined dictionary
weight = np.ones(classes.shape[0], dtype=np.float64, order='C')
if not isinstance(class_weight, dict):
raise ValueError("class_weight must be dict, 'balanced', or None,"
" got: %r" % class_weight)
for c in class_weight:
i = np.searchsorted(classes, c)
if i >= len(classes) or classes[i] != c:
raise ValueError("Class label {} not present.".format(c))
else:
weight[i] = class_weight[c]
...
在class_weight
上方的代码段中,sample_weight
适用于# Logistic loss is the negative of the log of the logistic function.
out = -np.sum(sample_weight * log_logistic(yz)) + .5 * alpha * np.dot(w, w)
# NOTE: ---> ^^^^^^^^^^^^^^^
,CREATE
用于_logistic_loss_and_grad,_logistic_loss等内部函数:
MATCH