我正在尝试将单词级文本生成器LSTM RNN从Keras转换为Tensorflow.js。但是,tensorflow.js模型以较低的速率训练,并产生垃圾预测。我已经仔细考虑了每个细节(至少,我认为我已经知道了),但我总是简短地说。
让我解释一下。
首先,这是我使用Keras(在Tensorflow后端)在前100个时期内的损失图:
...这是在使用Tensorflow.js的前100个时期中相同数据集的损失图:
因此,损失的确会减少,但是速度要慢得多,并且会以约0.02的学习率陷入约4.4的损失(Keras模型没有这样做,并且持续减少)。
第二,两个模型的输出差异很大。 Keras模型为如此小的数据集生成典型的输出,像这样的东西(经过10次迭代):
i i i my am and the the i what of and the and the the and the and i and the the and and the and i and i and i i i i and the and i i i i the the and i the the and i the and mine i and the and and and the burn the my and and i i i i and i burn the it the and it and i i and i the i i and i i the my i the my and and i i i the i
经过100次迭代:
path that winds 'pon the path i find, and claim as mine to ride the waves of unrest made to make me shine as a testament to why the ways of of of will will by shit by dismiss by by dismiss dismiss worship worship worship blood blood blood serpent's serpent's serpent's serpent's like like the the the the of of of of of of to to to we have we decide decide order, order, order, of they seizing rites of of of claim tomorrow tomorrow around around tomorrow at sacrifice of.' sacrifice around around like like the the the of the to to to to to to waves made made tomorrow upon rites of few no few know underworld, the no no no tomorrow around around around around around
另一方面,tensorflow.js模型只能预测相同的单词(无论迭代多少次,无论损失多么低):
the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the
我检查了预测结果向量,它们都非常相似(但不完全相同),所以我敢肯定模型只是被破坏了。
现在,我知道您可能在想什么-数据准备中肯定有一些问题。但是,我向您保证,这两个程序之间的训练数据和目标数据是相同的。我已经用这种细齿梳(即很多打印语句)检查了预训练代码中的所有内容,以至于我觉得自己可能会产生幻想。两个程序的X
和y
输入之间的大小,内容和数据类型都完全相同。如果您不相信我,请随时尝试完整的源代码here。
因此,尽管如此,我相信这是2种不同模型的相关代码。 Keras原创:
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(maxlen, len(words))))
model.add(Dropout(0.2))
model.add(LSTM(128, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(len(words)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=RMSprop(lr=0.002))
model.fit(X, y, batch_size=32, nb_epoch=10)
这是tensorflow.js中的一个:
var model = tf.sequential();
model.add(tf.layers.lstm({
units: 128,
returnSequences: true,
inputShape: [maxlen, words.length]
}));
model.add(tf.layers.dropout(0.2))
model.add(tf.layers.lstm({
units: 128,
returnSequences: false
}));
model.add(tf.layers.dropout(0.2))
model.add(tf.layers.dense({units: words.length, activation: 'softmax'}));
model.compile({loss: 'categoricalCrossentropy', optimizer: tf.train.rmsprop(0.002)});
await model.fit(x_tensor, y_tensor, {
epochs: 100,
batchSize: 32
})
据我所知,两者完全相同。作为参考,maxlen = 30
和words.length = 297
用于我提供的数据集(这是一个小的数据集,仅用于演示目的)
任何想法可能是什么问题?