我正在使用顺序数据解决不平衡的多分类(4类)问题。我准备的培训和测试集每个班级包含相同数量的记录:
我得到了我的LSTM Keras模型的以下验证结果。他们太糟糕了。在混淆矩阵中,可以看到所有记录都被分类为4类。
****************************
| MODEL PERFORMANCE REPORT |
****************************
Average F1 score = 0.10.
Balanced accuracy score = 0.25.
Confusion matrix
[[ 0 0 0 20]
[ 0 0 0 20]
[ 0 0 0 20]
[ 0 0 0 20]]
Other metrics
precision recall f1-score support
0 0.00 0.00 0.00 20
1 0.00 0.00 0.00 20
2 0.00 0.00 0.00 20
3 0.25 1.00 0.40 20
micro avg 0.25 0.25 0.25 80
macro avg 0.06 0.25 0.10 80
weighted avg 0.06 0.25 0.10 80
我不想进行超参数优化,因为我的模型似乎根本存在错误。
如果有人对LSTM和深度学习更有经验,可以指出我的错误,我将非常感谢。
这是我的数据(我使用一个非常小的样本来试验一个基本模型,稍后将在整个数据集中进行训练):
400 train sequences
80 test sequences
X_train shape: (400, 20, 17)
X_test shape: (80, 20, 17)
y_train shape: (400, 4)
y_test shape: (80, 4)
这是我的模型和拟合函数:
hidden_neurons = 50
timestamps = 20
nb_features = 18
model = Sequential()
model.add(LSTM(
units=hidden_neurons,
return_sequences=True,
input_shape=(timestamps,nb_features),
dropout=0.2,
recurrent_dropout=0.2
)
)
model.add(TimeDistributed(Dense(1)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(units=nb_classes,
activation='softmax'))
model.compile(loss="categorical_crossentropy",metrics = ['accuracy'],optimizer='adadelta')
history = model.fit(np.array(X_train), y_train,
validation_data=(np.array(X_test), y_test),
epochs=50,
batch_size=2,
callbacks=[model_metrics],
shuffle=False,
verbose=1)
class Metrics(Callback):
def on_train_begin(self, logs={}):
self.val_f1s = []
self.val_recalls = []
self.val_precisions = []
def on_epoch_end(self, epoch, logs={}):
val_predict = np.argmax((np.asarray(self.model.predict(self.validation_data[0]))).round(), axis=1)
val_targ = np.argmax(self.validation_data[1], axis=1)
_val_f1 = metrics.f1_score(val_targ, val_predict, average='weighted')
_val_recall = metrics.recall_score(val_targ, val_predict, average='weighted')
_val_precision = metrics.precision_score(val_targ, val_predict, average='weighted')
self.val_f1s.append(_val_f1)
self.val_recalls.append(_val_recall)
self.val_precisions.append(_val_precision)
print(" — val_f1: {:f} — val_precision: {:f} — val_recall {:f}".format(_val_f1, _val_precision, _val_recall))
return
model_metrics = Metrics()