我正在Tensorflow中实现以下架构。
https://i.stack.imgur.com/ZmcsX.png
在前几次迭代中,损失仍然为0.6915,但是在此之后,如下面的输出所示,无论我运行了多少次迭代,损失都在-0.0和正常数之间变化,具体取决于超参数。 发生这种情况是因为我的模型的预测变得非常小(接近零)或非常高(接近1)。因此无法训练模型。 这么小的预测或大的预测可能是什么原因?我该怎么做才能纠正它?
Input_C =(160,1)
Input_R =(160,1)
批量大小= 1
C =(批量大小,256)
R =(批量大小,256)
下面是我的模型以及输入形状: enter image description here
下面是示例输出:
Training loss (for one batch) at step 0: 0.691542387008667
Seen so far: 1 samples
Training loss (for one batch) at step 200: 0.6671515703201294
Seen so far: 201 samples
Training loss (for one batch) at step 400: -0.0
Seen so far: 401 samples
Training loss (for one batch) at step 600: -0.0
Seen so far: 601 samples
Training loss (for one batch) at step 800: -0.0
Seen so far: 801 samples
Training loss (for one batch) at step 1000: -0.0
Seen so far: 1001 samples
Training loss (for one batch) at step 1200: -0.0
Seen so far: 1201 samples
Training loss (for one batch) at step 1400: -0.0
Seen so far: 1401 samples
Training loss (for one batch) at step 1600: 15.424948692321777
Seen so far: 1601 samples
Training loss (for one batch) at step 1800: -0.0
Seen so far: 1801 samples
Training loss (for one batch) at step 2000: 15.424948692321777
Seen so far: 2001 samples
Training loss (for one batch) at step 2200: -0.0
Seen so far: 2201 samples
Training loss (for one batch) at step 2400: -0.0
Seen so far: 2401 samples
Training loss (for one batch) at step 2600: -0.0
Seen so far: 2601 samples
Training loss (for one batch) at step 2800: -0.0
Seen so far: 2801 samples
Training loss (for one batch) at step 3000: -0.0
Seen so far: 3001 samples
Training loss (for one batch) at step 3200: 15.424948692321777
Seen so far: 3201 samples
Training loss (for one batch) at step 3400: 15.424948692321777
Seen so far: 3401 samples
Training loss (for one batch) at step 3600: -0.0
Seen so far: 3601 samples
Training loss (for one batch) at step 3800: 15.424948692321777
Seen so far: 3801 samples
Training loss (for one batch) at step 4000: 15.424948692321777
Seen so far: 4001 samples
Training loss (for one batch) at step 4200: -0.0
Seen so far: 4201 samples
Training loss (for one batch) at step 4400: 15.424948692321777
Seen so far: 4401 samples
Training loss (for one batch) at step 4600: -0.0
Seen so far: 4601 samples
Training loss (for one batch) at step 4800: 15.424948692321777
Seen so far: 4801 samples
Training loss (for one batch) at step 5000: 15.424948692321777
Seen so far: 5001 samples
Training loss (for one batch) at step 5200: -0.0
Seen so far: 5201 samples
Training loss (for one batch) at step 5400: -0.0
以下是Sigmoid(CMR)的预测值。您可以看到它经过几次迭代后突然消失了。
Prediction : tf.Tensor([[0.50066364]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49867386]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49919522]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.4999423]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49848711]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.499426]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49959162]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49965566]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.50021386]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.4996987]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49993336]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49861637]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.50016826]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49728978]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49540216]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49112904]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49182785]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.44881523]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.01220286]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[9.062928e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.7185716e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.3001763e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[6.934234e-14]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.2812477e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.1744075e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.306665e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[6.9167836e-14]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.2757072e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.403139e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[6.9167836e-14]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.3142985e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[6.916903e-14]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.3480556e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.2885927e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[6.9167836e-14]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.2939568e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[6.9167836e-14]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[6.916797e-14]], shape=(1, 1), dtype=float32)
下面是在控制台中打印的预测(Sigmoid(CMR)),损失和标签值:
Prediction : tf.Tensor([[1.4857496e-12]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([1], shape=(1,), dtype=int64)
Loss : tf.Tensor([15.424949], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.0175745e-11]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([0], shape=(1,), dtype=int64)
Loss : tf.Tensor([-0.], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.9670995e-10]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([0], shape=(1,), dtype=int64)
Loss : tf.Tensor([-0.], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.7731953e-10]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([1], shape=(1,), dtype=int64)
Loss : tf.Tensor([15.424949], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.986521e-10]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([0], shape=(1,), dtype=int64)
Loss : tf.Tensor([-0.], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.6696887e-13]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([1], shape=(1,), dtype=int64)
Loss : tf.Tensor([15.424949], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.9859603e-10]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([0], shape=(1,), dtype=int64)
Loss : tf.Tensor([-0.], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.9074237e-12]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([1], shape=(1,), dtype=int64)
Loss : tf.Tensor([15.424949], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.9804261e-10]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([0], shape=(1,), dtype=int64)
Loss : tf.Tensor([-0.], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.9823462e-10]], shape=(1, 1), dtype=float32)
下面是我的代码。
encoder = Sequential()
encoder.add(Embedding(input_dim = MAX_NB_WORDS,output_dim = EMBEDDING_DIM,input_length = MAX_SENTENCE_LENGTH))
encoder.add(LSTM(units = 256))
# Create tensors for Context and Utterance
context_input = Input(shape=(MAX_SENTENCE_LENGTH,),dtype='float32')
utterance_input = Input(shape=(MAX_SENTENCE_LENGTH,),dtype='float32')
# Encode Context and Utterance through LSTM
encoded_context = encoder(context_input) # Shape = (None,256)
encoded_utterance = encoder(utterance_input) # Actual response encoding (None,256) --> Need to take its transpose to make dimenions add up
"""Use Custom layer to make GradientTape work"""
custom_layer = CustomLayer(256,256)
generated_response = custom_layer(encoded_context)
projection = tf.linalg.matmul(generated_response,tf.transpose(encoded_utterance))
probability = tf.math.sigmoid(projection)
dual_encoder = Model(inputs=[context_input,utterance_input],outputs = probability)
print("Trainable variables :",dual_encoder.trainable_weights)
"""https://stackoverflow.com/questions/55413421/importerror-failed-to-import-pydot-please-install-pydot-for-example-with"""
plot_model(dual_encoder, os.path.join(OUTPUT_PATH,'my_first_model.png'),show_shapes = True)
#dual_encoder.compile(loss = 'binary_crossentropy', optimizer = 'rmsprop',metrics=['accuracy'])
print("Summary of Dual Encoder LSTM :",dual_encoder.summary())
def create_batched_dataset(data_path):
tfrecord_dataset = tf.data.TFRecordDataset(os.path.join(data_path,"train.tfrecords"))
parsed_dataset = tfrecord_dataset.map(read_train_TFRecords,num_parallel_calls = 8)
parsed_dataset = parsed_dataset.repeat()
parsed_dataset = parsed_dataset.shuffle(SHUFFLE_BUFFER)
parsed_dataset = parsed_dataset.batch(BATCH_SIZE)
# iterator = tf.compat.v1.data.make_one_shot_iterator(parsed_dataset)
# batched_context,batched_utterance,batched_labels = iterator.get_next()
return parsed_dataset
parsed_dataset = create_batched_dataset(OUTPUT_PATH)
''' Attempting GradientTape '''
# reference - https://www.tensorflow.org/guide/keras/train_and_evaluate
optimizer = RMSprop(learning_rate=0.001, rho=0.9, momentum=0.1, epsilon=1e-07, centered=False)
epochs = 10
for epoch in range(epochs):
print('Start of epoch %d' % (epoch,))
# Iterate over the batches of the dataset.
for step, row in enumerate(parsed_dataset):
input_batch_context,input_batch_utterance,input_batch_label = row
#print("Context :",input_batch_context)
with tf.GradientTape() as tape:
# Run the forward pass of the layer. The operations that the layer applies to its inputs are going to be recorded on the GradientTape.
pred = dual_encoder([input_batch_context, input_batch_utterance])
#print("Prediction :",pred)
#print("Label :",input_batch_label)
# Compute the loss value for this minibatch.
loss_value = binary_crossentropy(input_batch_label, pred)
#print("Loss :",loss_value)
# Use the gradient tape to automatically retrieve the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, dual_encoder.trainable_weights)
# Run one step of gradient descent by updating the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, dual_encoder.trainable_weights))
# Log every 200 batches.
if step % 200 == 0:
print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
print('Seen so far: %s samples' % ((step + 1) * BATCH_SIZE))