Question

我的数据存在严重的阶级失衡。约99.99％的样品为阴性。正数（大致）在其他三个类别之间平均分配。我认为我正在训练的模型基本上总是在预测多数学生。因此，我正在尝试对课程进行加权。

模型

df[5-rev(sequence(2:5)-1),]
#     year value
# 1   2000     1
# 2   2001     2
# 3   2002     3
# 4   2003     4
# 5   2004     5
# 2.1 2001     2
# 3.1 2002     3
# 4.1 2003     4
# 5.1 2004     5
# 3.2 2002     3
# 4.2 2003     4
# 5.2 2004     5
# 4.3 2003     4
# 5.3 2004     5

（5-rev(sequence(2:5)-1) # [1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 rev(sequence(2:5)-1) # [1] 4 3 2 1 0 3 2 1 0 2 1 0 1 0 sequence(2:5)-1 # [1] 0 1 0 1 2 0 1 2 3 0 1 2 3 4 sequence(2:5) # [1] 1 2 1 2 3 1 2 3 4 1 2 3 4 5与Hyperas一起使用。）

我如何尝试对其进行加权

\ 1。在model = Sequential() #Layer 1 model.add(Conv1D( {{choice([32, 64, 90, 128])}}, {{choice([3, 4, 5, 6, 8])}}, activation='relu', kernel_initializer=kernel_initializer, input_shape=X_train.shape[1:])) model.add(BatchNormalization()) #Layer 2 model.add(Conv1D({{choice([32, 64, 90, 128])}}, {{choice([3, 4, 5, 6])}}, activation='relu',kernel_initializer=kernel_initializer)) model.add(Dropout({{uniform(0, 0.9)}})) #Flatten model.add(Flatten()) #Output model.add(Dense(4, activation='softmax'))

中使用{{...}}

class_weight

\ 2。将model.fit()中的model.fit(X_train, Y_train, batch_size=64, epochs=10, verbose=2, validation_data=(X_test, Y_test), class_weight={0: 9999, 1:9999, 2: 9999, 3:1})与class_weight model.fit()

sklearn

\ 3。具有自定义损失功能

compute_class_weight()

结果

可怜。所有类别的准确度均为〜model.fit(..., class_weight=class_weight.compute_class_weight("balanced", np.unique(Y_train), Y_train)，所有类别的不平衡准确度约为〜from keras import backend as K def custom_loss(weights): #gist.github.com/wassname/ce364fddfc8a025bfab4348cf5de852d def loss(Y_true, Y_pred): Y_pred /= K.sum(Y_pred, axis=-1, keepdims=True) Y_pred = K.clip(Y_pred, K.epsilon(), 1 - K.epsilon()) loss = Y_true * K.log(Y_pred) * weights loss = -K.sum(loss, -1) return loss return loss extreme_weights = np.array([9999, 9999, 9999, 1]) model.compile(loss=custom_loss(extreme_weights), metrics=['accuracy'], optimizer={{choice(['rmsprop', 'adam', 'sgd','Adagrad','Adadelta'])}} ) #(then fit *without* class_weight)。但是，更有意义的指标（如auPRC）却大不相同。对于大多数类别，auPRC接近.99，对于其他类别，auPRC几乎.5。

这是Keras如何平衡班级的吗？它只是确保它们之间的准确性是相同的，或者指标是否应该相等或可比？还是我指定的权重错误？

Answer 1

Keras在训练过程中使用班级权重，但是准确性并不能反映这一点。与所有类别之间的权重无关，所有样本的准确性均得到计算。这是因为您在compile（）中使用度量“准确性”。您可以定义一个自定义且更准确的加权精度，然后使用它或使用sklearn指标（例如f1_score（）可以是“二进制”，“加权”等）。

示例：

def macro_f1(y_true, y_pred):
     return f1_score(y_true, y_pred, average='macro')


model.compile(loss=custom_loss(extreme_weights),
        metrics=['accuracy', macro_f1],
        optimizer={{choice(['rmsprop', 'adam', 'sgd','Adagrad','Adadelta'])}}
        )

Keras：class_weight实际上试图平衡什么？

1 个答案: