Question

我正在Tensorflow中实现以下架构。

https://i.stack.imgur.com/ZmcsX.png

在前几次迭代中，损失仍然为0.6915，但是在此之后，如下面的输出所示，无论我运行了多少次迭代，损失都在-0.0和正常数之间变化，具体取决于超参数。发生这种情况是因为我的模型的预测变得非常小（接近零）或非常高（接近1）。因此无法训练模型。这么小的预测或大的预测可能是什么原因？我该怎么做才能纠正它？

Input_C =（160,1）

Input_R =（160,1）

批量大小= 1

C =（批量大小，256）

R =（批量大小，256）

下面是我的模型以及输入形状： enter image description here

下面是示例输出：

Training loss (for one batch) at step 0: 0.691542387008667
Seen so far: 1 samples
Training loss (for one batch) at step 200: 0.6671515703201294
Seen so far: 201 samples
Training loss (for one batch) at step 400: -0.0
Seen so far: 401 samples
Training loss (for one batch) at step 600: -0.0
Seen so far: 601 samples
Training loss (for one batch) at step 800: -0.0
Seen so far: 801 samples
Training loss (for one batch) at step 1000: -0.0
Seen so far: 1001 samples
Training loss (for one batch) at step 1200: -0.0
Seen so far: 1201 samples
Training loss (for one batch) at step 1400: -0.0
Seen so far: 1401 samples
Training loss (for one batch) at step 1600: 15.424948692321777
Seen so far: 1601 samples
Training loss (for one batch) at step 1800: -0.0
Seen so far: 1801 samples
Training loss (for one batch) at step 2000: 15.424948692321777
Seen so far: 2001 samples
Training loss (for one batch) at step 2200: -0.0
Seen so far: 2201 samples
Training loss (for one batch) at step 2400: -0.0
Seen so far: 2401 samples
Training loss (for one batch) at step 2600: -0.0
Seen so far: 2601 samples
Training loss (for one batch) at step 2800: -0.0
Seen so far: 2801 samples
Training loss (for one batch) at step 3000: -0.0
Seen so far: 3001 samples
Training loss (for one batch) at step 3200: 15.424948692321777
Seen so far: 3201 samples
Training loss (for one batch) at step 3400: 15.424948692321777
Seen so far: 3401 samples
Training loss (for one batch) at step 3600: -0.0
Seen so far: 3601 samples
Training loss (for one batch) at step 3800: 15.424948692321777
Seen so far: 3801 samples
Training loss (for one batch) at step 4000: 15.424948692321777
Seen so far: 4001 samples
Training loss (for one batch) at step 4200: -0.0
Seen so far: 4201 samples
Training loss (for one batch) at step 4400: 15.424948692321777
Seen so far: 4401 samples
Training loss (for one batch) at step 4600: -0.0
Seen so far: 4601 samples
Training loss (for one batch) at step 4800: 15.424948692321777
Seen so far: 4801 samples
Training loss (for one batch) at step 5000: 15.424948692321777
Seen so far: 5001 samples
Training loss (for one batch) at step 5200: -0.0
Seen so far: 5201 samples
Training loss (for one batch) at step 5400: -0.0

以下是Sigmoid（CMR）的预测值。您可以看到它经过几次迭代后突然消失了。

Prediction : tf.Tensor([[0.50066364]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49867386]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49919522]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.4999423]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49848711]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.499426]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49959162]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49965566]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.50021386]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.4996987]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49993336]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49861637]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.50016826]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49728978]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49540216]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49112904]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.49182785]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.44881523]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[0.01220286]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[9.062928e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.7185716e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.3001763e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[6.934234e-14]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.2812477e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.1744075e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.306665e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[6.9167836e-14]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.2757072e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.403139e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[6.9167836e-14]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.3142985e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[6.916903e-14]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.3480556e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.2885927e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[6.9167836e-14]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[1.2939568e-13]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[6.9167836e-14]], shape=(1, 1), dtype=float32)
Prediction : tf.Tensor([[6.916797e-14]], shape=(1, 1), dtype=float32)

下面是在控制台中打印的预测（Sigmoid（CMR）），损失和标签值：

Prediction : tf.Tensor([[1.4857496e-12]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([1], shape=(1,), dtype=int64)
Loss : tf.Tensor([15.424949], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.0175745e-11]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([0], shape=(1,), dtype=int64)
Loss : tf.Tensor([-0.], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.9670995e-10]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([0], shape=(1,), dtype=int64)
Loss : tf.Tensor([-0.], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.7731953e-10]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([1], shape=(1,), dtype=int64)
Loss : tf.Tensor([15.424949], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.986521e-10]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([0], shape=(1,), dtype=int64)
Loss : tf.Tensor([-0.], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.6696887e-13]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([1], shape=(1,), dtype=int64)
Loss : tf.Tensor([15.424949], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.9859603e-10]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([0], shape=(1,), dtype=int64)
Loss : tf.Tensor([-0.], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.9074237e-12]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([1], shape=(1,), dtype=int64)
Loss : tf.Tensor([15.424949], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.9804261e-10]], shape=(1, 1), dtype=float32)
Label : tf.Tensor([0], shape=(1,), dtype=int64)
Loss : tf.Tensor([-0.], shape=(1,), dtype=float32)
Prediction : tf.Tensor([[1.9823462e-10]], shape=(1, 1), dtype=float32)

下面是我的代码。

    encoder = Sequential()
    encoder.add(Embedding(input_dim = MAX_NB_WORDS,output_dim = EMBEDDING_DIM,input_length = MAX_SENTENCE_LENGTH))
    encoder.add(LSTM(units = 256))

    # Create tensors for Context and Utterance
    context_input = Input(shape=(MAX_SENTENCE_LENGTH,),dtype='float32')
    utterance_input = Input(shape=(MAX_SENTENCE_LENGTH,),dtype='float32')

    # Encode Context and Utterance through LSTM
    encoded_context = encoder(context_input)            # Shape = (None,256)
    encoded_utterance = encoder(utterance_input)        # Actual response encoding (None,256) --> Need to take its transpose to make dimenions add up

    """Use Custom layer to make GradientTape work"""
    custom_layer = CustomLayer(256,256)
    generated_response = custom_layer(encoded_context)

    projection = tf.linalg.matmul(generated_response,tf.transpose(encoded_utterance))
    probability = tf.math.sigmoid(projection)

    dual_encoder = Model(inputs=[context_input,utterance_input],outputs = probability)
    print("Trainable variables :",dual_encoder.trainable_weights)
    """https://stackoverflow.com/questions/55413421/importerror-failed-to-import-pydot-please-install-pydot-for-example-with"""
    plot_model(dual_encoder, os.path.join(OUTPUT_PATH,'my_first_model.png'),show_shapes = True)


    #dual_encoder.compile(loss = 'binary_crossentropy', optimizer = 'rmsprop',metrics=['accuracy'])
    print("Summary of Dual Encoder LSTM :",dual_encoder.summary())
    def create_batched_dataset(data_path):
        tfrecord_dataset = tf.data.TFRecordDataset(os.path.join(data_path,"train.tfrecords"))
        parsed_dataset = tfrecord_dataset.map(read_train_TFRecords,num_parallel_calls = 8)
        parsed_dataset = parsed_dataset.repeat()
        parsed_dataset = parsed_dataset.shuffle(SHUFFLE_BUFFER)
        parsed_dataset = parsed_dataset.batch(BATCH_SIZE)
        # iterator = tf.compat.v1.data.make_one_shot_iterator(parsed_dataset)
        # batched_context,batched_utterance,batched_labels = iterator.get_next()
        return parsed_dataset

    parsed_dataset = create_batched_dataset(OUTPUT_PATH)

    ''' Attempting GradientTape '''

    # reference - https://www.tensorflow.org/guide/keras/train_and_evaluate
    optimizer = RMSprop(learning_rate=0.001, rho=0.9, momentum=0.1, epsilon=1e-07, centered=False)

    epochs = 10
    for epoch in range(epochs):
        print('Start of epoch %d' % (epoch,))

      # Iterate over the batches of the dataset.
        for step, row in enumerate(parsed_dataset):
            input_batch_context,input_batch_utterance,input_batch_label = row
            #print("Context :",input_batch_context)
            with tf.GradientTape() as tape:

                # Run the forward pass of the layer. The operations that the layer applies to its inputs are going to be recorded on the GradientTape.
                pred = dual_encoder([input_batch_context, input_batch_utterance])
                #print("Prediction :",pred)
                #print("Label :",input_batch_label)
                # Compute the loss value for this minibatch.
                loss_value = binary_crossentropy(input_batch_label, pred)
                #print("Loss :",loss_value)

            # Use the gradient tape to automatically retrieve the gradients of the trainable variables with respect to the loss.
            grads = tape.gradient(loss_value, dual_encoder.trainable_weights)

            # Run one step of gradient descent by updating the value of the variables to minimize the loss.
            optimizer.apply_gradients(zip(grads, dual_encoder.trainable_weights))

            # Log every 200 batches.
            if step % 200 == 0:
                print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
                print('Seen so far: %s samples' % ((step + 1) * BATCH_SIZE))

模型的预测变得很小。损耗为0或正常数

0 个答案: