我的神经网络是本文提出的模型的略微修改版本:https://arxiv.org/pdf/1606.01781.pdf
我的目标是将文本分类为9个不同的类别。我正在使用29个卷积层,并将任何文本的最大长度设置为256个字符。
训练数据有900k个样本,验证数据有35k个样本。数据非常不平衡,因此我做了一些数据扩充来平衡训练数据(显然没有触及验证数据),然后在训练中使用了班级权重。
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 256) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 256, 16) 1152
_________________________________________________________________
conv1d_1 (Conv1D) (None, 256, 64) 3136
_________________________________________________________________
sequential_1 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_2 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_3 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_4 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_5 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_6 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_7 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_8 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_9 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
sequential_10 (Sequential) (None, 256, 64) 25216
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 128, 64) 0
_________________________________________________________________
sequential_11 (Sequential) (None, 128, 128) 75008
_________________________________________________________________
sequential_12 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_13 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_14 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_15 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_16 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_17 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_18 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_19 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
sequential_20 (Sequential) (None, 128, 128) 99584
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 64, 128) 0
_________________________________________________________________
sequential_21 (Sequential) (None, 64, 256) 297472
_________________________________________________________________
sequential_22 (Sequential) (None, 64, 256) 395776
_________________________________________________________________
sequential_23 (Sequential) (None, 64, 256) 395776
_________________________________________________________________
sequential_24 (Sequential) (None, 64, 256) 395776
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 32, 256) 0
_________________________________________________________________
sequential_25 (Sequential) (None, 32, 512) 1184768
_________________________________________________________________
sequential_26 (Sequential) (None, 32, 512) 1577984
_________________________________________________________________
sequential_27 (Sequential) (None, 32, 512) 1577984
_________________________________________________________________
sequential_28 (Sequential) (None, 32, 512) 1577984
_________________________________________________________________
lambda_1 (Lambda) (None, 4096) 0
_________________________________________________________________
dense_1 (Dense) (None, 2048) 8390656
_________________________________________________________________
dense_2 (Dense) (None, 2048) 4196352
_________________________________________________________________
dense_3 (Dense) (None, 9) 18441
=================================================================
Total params: 21,236,681
Trainable params: 21,216,713
Non-trainable params: 19,968
对我来说,损失曲线看起来很奇怪,因为我无法在曲线上发现典型的过度拟合效果,但是训练损失和验证损失之间的差异仍然很大。同样,在任何时期,训练损失在时期#1处都比验证损失要低得多。
这是我应该担心的事情,如何改进我的模型?
谢谢!
答案 0 :(得分:1)
为缩小训练和验证错误之间的距离,我建议两件事: