我在伽玛层中使用Keras.Backend.armax()
。该模型编译良好,但在fit()期间引发错误。
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
我的模特:
latent_dim = 512
encoder_inputs = Input(shape=(train_data.shape[1],))
encoder_dense = Dense(vocabulary, activation='softmax')
encoder_outputs = Embedding(vocabulary, latent_dim)(encoder_inputs)
encoder_outputs = LSTM(latent_dim, return_sequences=True)(encoder_outputs)
encoder_outputs = Dropout(0.5)(encoder_outputs)
encoder_outputs = encoder_dense(encoder_outputs)
encoder_outputs = Lambda(K.argmax, arguments={'axis':-1})(encoder_outputs)
encoder_outputs = Lambda(K.cast, arguments={'dtype':'float32'})(encoder_outputs)
encoder_dense1 = Dense(train_label.shape[1], activation='softmax')
decoder_embedding = Embedding(vocabulary, latent_dim)
decoder_lstm1 = LSTM(latent_dim, return_sequences=True)
decoder_lstm2 = LSTM(latent_dim, return_sequences=True)
decoder_dense2 = Dense(vocabulary, activation='softmax')
decoder_outputs = encoder_dense1(encoder_outputs)
decoder_outputs = decoder_embedding(decoder_outputs)
decoder_outputs = decoder_lstm1(decoder_outputs)
decoder_outputs = decoder_lstm2(decoder_outputs)
decoder_outputs = Dropout(0.5)(decoder_outputs)
decoder_outputs = decoder_dense2(decoder_outputs)
model = Model(encoder_inputs, decoder_outputs)
model.summary()
易于查看的模型摘要:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_7 (InputLayer) (None, 32) 0
_________________________________________________________________
embedding_13 (Embedding) (None, 32, 512) 2018816
_________________________________________________________________
lstm_19 (LSTM) (None, 32, 512) 2099200
_________________________________________________________________
dropout_10 (Dropout) (None, 32, 512) 0
_________________________________________________________________
dense_19 (Dense) (None, 32, 3943) 2022759
_________________________________________________________________
lambda_5 (Lambda) (None, 32) 0
_________________________________________________________________
lambda_6 (Lambda) (None, 32) 0
_________________________________________________________________
dense_20 (Dense) (None, 501) 16533
_________________________________________________________________
embedding_14 (Embedding) (None, 501, 512) 2018816
_________________________________________________________________
lstm_20 (LSTM) (None, 501, 512) 2099200
_________________________________________________________________
lstm_21 (LSTM) (None, 501, 512) 2099200
_________________________________________________________________
dropout_11 (Dropout) (None, 501, 512) 0
_________________________________________________________________
dense_21 (Dense) (None, 501, 3943) 2022759
=================================================================
Total params: 14,397,283
Trainable params: 14,397,283
Non-trainable params: 0
_________________________________________________________________
我用谷歌搜索了解决方案,但几乎所有都是关于错误模型的。一些建议不要使用引起问题的功能。但是,如您所见,如果没有K.argmax,我将无法创建此模型(如果您知道其他方法,请告诉我)。我该如何解决这个问题,然后训练我的模型?
答案 0 :(得分:0)
出于明显的原因,Argmax函数没有梯度;怎么定义呢?为了使模型起作用,您需要使该层不可训练。根据{{3}}(或this question),您需要将trainable = False
传递到图层。至于图层权重(如果适用),您可能希望将其设置为一个单位矩阵。