我正在尝试创建英语到法语的翻译。我有一个运作良好的基本模型:
平均步骤时间:232.3
最终亏损:0.4969
型号:
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 15, 336) 67200
_________________________________________________________________
lstm_1 (LSTM) (None, 256) 607232
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 21, 256) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 21, 256) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 21, 256) 525312
_________________________________________________________________
dropout_2 (Dropout) (None, 21, 256) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 21, 336) 86352
=================================================================
Total params: 1,286,096
Trainable params: 1,286,096
Non-trainable params: 0
_________________________________________________________________
Python:
model = Sequential()
model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len, mask_zero=True))
model.add(LSTM(256))
model.add(RepeatVector(fr_max_len))
model.add(Dropout(0.5))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))
然后我尝试实现另一个LSTM层和两个1d卷积层:
平均时间:402s
最终亏损:1.0899
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 15, 336) 67200
_________________________________________________________________
dropout_1 (Dropout) (None, 15, 336) 0
_________________________________________________________________
conv1d_1 (Conv1D) (None, 15, 32) 32288
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 7, 32) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 7, 32) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 7, 16) 2064
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 3, 16) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 3, 16) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 128) 74240
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 21, 128) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 21, 128) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 21, 512) 1312768
_________________________________________________________________
dropout_5 (Dropout) (None, 21, 512) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 21, 128) 328192
_________________________________________________________________
dropout_6 (Dropout) (None, 21, 128) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 21, 336) 43344
=================================================================
Total params: 1,860,096
Trainable params: 1,860,096
Non-trainable params: 0
_________________________________________________________________
Python:
model = Sequential()
model.add(Embedding(en_vocab_size, fr_vocab_size, input_length=en_max_len))
model.add(Dropout(0.2))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.3))
model.add(Conv1D(filters=16, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.2))
model.add(LSTM(128))
model.add(RepeatVector(fr_max_len))
model.add(Dropout(0.2))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(fr_vocab_size, activation='softmax')))
您可以看到,第二个训练不仅花费了更长的时间,损失更大,而且准确性也大大降低。为什么会这样呢?我的假设是我错误地实现了卷积层。在递归神经网络(或lstm网络)中实现卷积层的最佳方法是什么?