如何在Keras中为不平衡的多分类问题设计合适的模型?

时间:2019-02-09 11:01:41

标签: python machine-learning keras

我正在使用顺序数据解决不平衡的多分类(4类)问题。我准备的培训和测试集每个班级包含相同数量的记录:

  • 100、100、100、100->训练集
  • 20、20、20、20->测试集

我得到了我的LSTM Keras模型的以下验证结果。他们太糟糕了。在混淆矩阵中,可以看到所有记录都被分类为4类。

****************************
| MODEL PERFORMANCE REPORT |
****************************
Average F1 score = 0.10.
Balanced accuracy score = 0.25.
Confusion matrix
[[ 0  0  0 20]
 [ 0  0  0 20]
 [ 0  0  0 20]
 [ 0  0  0 20]]
Other metrics
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        20
           1       0.00      0.00      0.00        20
           2       0.00      0.00      0.00        20
           3       0.25      1.00      0.40        20

   micro avg       0.25      0.25      0.25        80
   macro avg       0.06      0.25      0.10        80
weighted avg       0.06      0.25      0.10        80

我不想进行超参数优化,因为我的模型似乎根本存在错误。

如果有人对LSTM和深度学习更有经验,可以指出我的错误,我将非常感谢。

这是我的数据(我使用一个非常小的样本来试验一个基本模型,稍后将在整个数据集中进行训练):

400 train sequences
80 test sequences
X_train shape: (400, 20, 17)
X_test shape: (80, 20, 17)
y_train shape: (400, 4)
y_test shape: (80, 4)

这是我的模型和拟合函数:

hidden_neurons = 50
timestamps = 20
nb_features = 18

model = Sequential()

model.add(LSTM(
                units=hidden_neurons,
                return_sequences=True, 
                input_shape=(timestamps,nb_features),
                dropout=0.2, 
                recurrent_dropout=0.2
              )
         )

model.add(TimeDistributed(Dense(1)))

model.add(Dropout(0.2))

model.add(Flatten())

model.add(Dense(units=nb_classes,
               activation='softmax'))

model.compile(loss="categorical_crossentropy",metrics = ['accuracy'],optimizer='adadelta')

history = model.fit(np.array(X_train), y_train, 
                    validation_data=(np.array(X_test), y_test),
                    epochs=50,
                    batch_size=2,
                    callbacks=[model_metrics],
                    shuffle=False,
                    verbose=1)


class Metrics(Callback):

    def on_train_begin(self, logs={}):
        self.val_f1s = []
        self.val_recalls = []
        self.val_precisions = []

    def on_epoch_end(self, epoch, logs={}):
        val_predict = np.argmax((np.asarray(self.model.predict(self.validation_data[0]))).round(), axis=1)
        val_targ = np.argmax(self.validation_data[1], axis=1)
        _val_f1 = metrics.f1_score(val_targ, val_predict, average='weighted')
        _val_recall = metrics.recall_score(val_targ, val_predict, average='weighted')
        _val_precision = metrics.precision_score(val_targ, val_predict, average='weighted')
        self.val_f1s.append(_val_f1)
        self.val_recalls.append(_val_recall)
        self.val_precisions.append(_val_precision)
        print(" — val_f1: {:f} — val_precision: {:f} — val_recall {:f}".format(_val_f1, _val_precision, _val_recall))
        return

model_metrics = Metrics()

enter image description here

enter image description here

0 个答案:

没有答案