Question

我想在keras model.fit中使用class_weight参数来处理不平衡的训练数据。通过查看一些文档，我知道我们可以通过这样的字典：

class_weight = {0 : 1,
    1: 1,
    2: 5}

（在这个例子中，2级将在损失函数中获得更高的惩罚。）

问题是我的网络输出有一个热编码，即类0 =（1,0,0），类-1 =（0,1,0）和类-3 =（0,0， 1）。

我们如何将class_weight用于单热编码输出？

通过查看some codes in Keras，看起来_feed_output_names包含输出类列表，但就我而言，model.output_names / model._feed_output_names会返回['dense_1'] < / p>

相关：How to set class weights for imbalanced classes in Keras?

Answer 1

这是一个更短，更快的解决方案。如果您的单热编码y是np.array：

import numpy as np
from sklearn.utils.class_weight import compute_class_weight

y_integers = np.argmax(y, axis=1)
class_weights = compute_class_weight('balanced', np.unique(y_integers), y_integers)
d_class_weights = dict(enumerate(class_weights))

然后，

d_class_weights可以传递到class_weight中的.fit。

Answer 2

有点令人费解的答案，但到目前为止我发现的最好。这假设您的数据是单热编码，多类，并且仅在标签DataFrame df_y上工作：

import pandas as pd
import numpy as np

# Create a pd.series that represents the categorical class of each one-hot encoded row
y_classes = df_y.idxmax(1, skipna=False)

from sklearn.preprocessing import LabelEncoder

# Instantiate the label encoder
le = LabelEncoder()

# Fit the label encoder to our label series
le.fit(list(y_classes))

# Create integer based labels Series
y_integers = le.transform(list(y_classes))

# Create dict of labels : integer representation
labels_and_integers = dict(zip(y_classes, y_integers))

from sklearn.utils.class_weight import compute_class_weight, compute_sample_weight

class_weights = compute_class_weight('balanced', np.unique(y_integers), y_integers)
sample_weights = compute_sample_weight('balanced', y_integers)

class_weights_dict = dict(zip(le.transform(list(le.classes_)), class_weights))

这导致计算sample_weights向量以平衡可以传递给Keras sample_weight属性的不平衡数据集，以及可以馈送到Keras {{1}的class_weights_dict。 class_weight方法中的属性。你真的不想同时使用两者，只需选择一个。我现在正在使用.fit，因为class_weight使用sample_weight很复杂。

Answer 3

我想我们可以使用sample_weights代替。实际上，在Keras内部，class_weights会转换为sample_weights。

sample_weight：与x相同长度的可选数组，包含适用于每个样本的模型损失的权重。如果是时态数据，你可以传递一个形状的2D数组（样本， sequence_length），对每个时间步长应用不同的权重每个样本。在这种情况下，您应该确保指定 compile（）中的sample_weight_mode =“temporal”。

https://github.com/fchollet/keras/blob/d89afdfd82e6e27b850d910890f4a4059ddea331/keras/engine/training.py#L1392

Answer 4

在_standardize_weights中，keras确实：

if y.shape[1] > 1:
    y_classes = y.argmax(axis=1)

所以基本上，如果你选择使用单热编码，那么这些类就是列索引。

您也可以问自己如何将列索引映射到数据的原始类。好吧，如果您使用scikit的LabelEncoder类学习执行单热编码，则列索引会映射unique labels函数计算的.fit的顺序。医生说

提取有序的唯一标签数组

示例：

from sklearn.preprocessing import LabelBinarizer
y=[4,1,2,8]
l=LabelBinarizer()
y_transformed=l.fit_transorm(y)
y_transormed
> array([[0, 0, 1, 0],
   [1, 0, 0, 0],
   [0, 1, 0, 0],
   [0, 0, 0, 1]])
l.classes_
> array([1, 2, 4, 8])

作为结论，class_weights字典的键应该反映编码器的classes_属性中的顺序。

Keras：用于单热编码的类权重（class_weight）

4 个答案: