LSTM损失函数无法收敛

时间:2019-12-19 18:14:44

标签: tensorflow keras pytorch lstm

我知道这是一个普遍的问题,我在Stack Overflow上看到了很多问题,但是没有一个问题对我有帮助,所以...我们继续。

我有一个基于keras的LSTM模型,该模型使用this数据集作为包含时间序列的'hdfs_train'。数据集中的每一行都是一个会话,代表一堆日志,这些日志已根据其形状编码为数字。我的目标是预测下一个数字(即下一个日志的形状)。为此,我正在构建下一个LSTM模型。

import tensorflow as tf
import numpy as np
keras = tf.keras

# Hyperparameters
window_size = 10
input_size = 1
hidden_size = 64
num_layers = 2
num_classes = 28
num_epochs = 40
num_candidates = 9
batch_size = 2048
model_path = 'model/'
log = 'TFAdam_batch_size=' + str(batch_size) + ';epoch=' + str(num_epochs)


modeltf = keras.models.Sequential([
  keras.layers.LSTM(hidden_size, return_sequences=True,
                         batch_input_shape=(batch_size, 10, 1)),
  keras.layers.LSTM(hidden_size, return_sequences=False),
  keras.layers.Dense(num_classes),
])

modeltf.compile(optimizer=keras.optimizers.Adam(),
              loss='sparse_categorical_crossentropy',
              metrics=['sparse_categorical_accuracy'])


#this function outputs numpy arrays of the right dimensions in order to feed the network
def generate(name):
    num_sessions = 0
    inputs = []
    outputs = []
    with open('data/' + name, 'r') as f:
        for line in f.readlines():
            num_sessions += 1
            line = tuple(map(lambda n: n - 1, map(int, line.strip().split())))
            for i in range(len(line) - window_size):
                inputs.append(line[i:i + window_size])
                outputs.append(line[i + window_size])
    print('Number of sessions({}): {}'.format(name, num_sessions))
    print('Number of seqs({}): {}'.format(name, len(inputs)))

    inputs = np.array(inputs)
    outputs = np.array(outputs)

    outputs = np.reshape(outputs, (outputs.shape[0], 1))
    temp = np.concatenate((np.array(inputs), np.array(outputs)), axis = 1)
    np.random.shuffle(temp)
    inside = temp[:,:10]
    outside = temp[:,10]
    dataset = [inside, outside]

    return dataset

seq_dataset = generate('hdfs_train')
trainX = seq_dataset[0]
trainY = seq_dataset[1]

#the inputs size must be a multiple of the batch size, so I discard the end
trainX = trainX[:45056]
trainY = trainY[:45056]

print(trainX.shape)
print(trainY.shape)

trainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))

modeltf.fit(trainX, trainY,shuffle=True, epochs=num_epochs, batch_size=batch_size)

上面的输出是:

Train on 45056 samples
Epoch 1/40
45056/45056 [==============================] - 4s 83us/sample - loss: 6.6886 - sparse_categorical_accuracy: 0.2266
Epoch 2/40
45056/45056 [==============================] - 3s 67us/sample - loss: 6.0615 - sparse_categorical_accuracy: 0.2551
Epoch 3/40
45056/45056 [==============================] - 3s 74us/sample - loss: 6.0513 - sparse_categorical_accuracy: 0.2473
Epoch 4/40
45056/45056 [==============================] - 3s 70us/sample - loss: 6.2496 - sparse_categorical_accuracy: 0.0236
Epoch 5/40
45056/45056 [==============================] - 3s 71us/sample - loss: 6.2042 - sparse_categorical_accuracy: 0.0231
Epoch 6/40
45056/45056 [==============================] - 3s 69us/sample - loss: 6.1858 - sparse_categorical_accuracy: 0.0231
Epoch 7/40
45056/45056 [==============================] - 3s 69us/sample - loss: 6.1692 - sparse_categorical_accuracy: 0.0231
Epoch 8/40
45056/45056 [==============================] - 3s 74us/sample - loss: 6.1538 - sparse_categorical_accuracy: 0.0231
Epoch 9/40
45056/45056 [==============================] - 3s 69us/sample - loss: 6.1379 - sparse_categorical_accuracy: 0.0231
Epoch 10/40
45056/45056 [==============================] - 3s 66us/sample - loss: 6.1183 - sparse_categorical_accuracy: 0.0231
Epoch 11/40
45056/45056 [==============================] - 3s 70us/sample - loss: 6.0919 - sparse_categorical_accuracy: 0.0231
Epoch 12/40
45056/45056 [==============================] - 3s 71us/sample - loss: 6.0503 - sparse_categorical_accuracy: 0.0231
Epoch 13/40
45056/45056 [==============================] - 3s 74us/sample - loss: 5.9894 - sparse_categorical_accuracy: 0.0231
Epoch 14/40
45056/45056 [==============================] - 3s 74us/sample - loss: 5.9503 - sparse_categorical_accuracy: 0.0231
Epoch 15/40
45056/45056 [==============================] - 3s 74us/sample - loss: 5.9388 - sparse_categorical_accuracy: 0.0231
Epoch 16/40
45056/45056 [==============================] - 3s 70us/sample - loss: 5.9254 - sparse_categorical_accuracy: 0.0231
Epoch 17/40
45056/45056 [==============================] - 3s 68us/sample - loss: 5.9167 - sparse_categorical_accuracy: 0.0231
Epoch 18/40
45056/45056 [==============================] - 3s 67us/sample - loss: 5.9087 - sparse_categorical_accuracy: 0.0231
Epoch 19/40
45056/45056 [==============================] - 3s 67us/sample - loss: 5.9003 - sparse_categorical_accuracy: 0.0231
Epoch 20/40
45056/45056 [==============================] - 3s 66us/sample - loss: 5.8925 - sparse_categorical_accuracy: 0.0231
Epoch 21/40
45056/45056 [==============================] - 3s 68us/sample - loss: 5.8884 - sparse_categorical_accuracy: 0.0231
Epoch 22/40
45056/45056 [==============================] - 3s 66us/sample - loss: 5.8824 - sparse_categorical_accuracy: 0.0231
Epoch 23/40
45056/45056 [==============================] - 3s 69us/sample - loss: 5.8817 - sparse_categorical_accuracy: 0.0231
Epoch 24/40
45056/45056 [==============================] - 3s 68us/sample - loss: 5.8729 - sparse_categorical_accuracy: 0.0231
Epoch 25/40
45056/45056 [==============================] - 4s 79us/sample - loss: 5.8664 - sparse_categorical_accuracy: 0.0231
Epoch 26/40
45056/45056 [==============================] - 3s 67us/sample - loss: 5.8640 - sparse_categorical_accuracy: 0.0231
Epoch 27/40
45056/45056 [==============================] - 3s 71us/sample - loss: 5.8576 - sparse_categorical_accuracy: 0.0231
Epoch 28/40
45056/45056 [==============================] - 3s 67us/sample - loss: 6.0034 - sparse_categorical_accuracy: 0.0350
Epoch 29/40
45056/45056 [==============================] - 3s 66us/sample - loss: 6.1654 - sparse_categorical_accuracy: 0.0684
Epoch 30/40
45056/45056 [==============================] - 3s 72us/sample - loss: 6.1555 - sparse_categorical_accuracy: 0.0689
Epoch 31/40
45056/45056 [==============================] - 3s 76us/sample - loss: 6.1485 - sparse_categorical_accuracy: 0.0689
Epoch 32/40
45056/45056 [==============================] - 3s 72us/sample - loss: 6.0946 - sparse_categorical_accuracy: 0.0686
Epoch 33/40
45056/45056 [==============================] - 3s 70us/sample - loss: 6.0882 - sparse_categorical_accuracy: 0.0686
Epoch 34/40
45056/45056 [==============================] - 3s 73us/sample - loss: 6.0840 - sparse_categorical_accuracy: 0.0687
Epoch 35/40
45056/45056 [==============================] - 3s 74us/sample - loss: 6.0808 - sparse_categorical_accuracy: 0.0686
Epoch 36/40
45056/45056 [==============================] - 3s 66us/sample - loss: 6.0765 - sparse_categorical_accuracy: 0.0686
Epoch 37/40
45056/45056 [==============================] - 3s 66us/sample - loss: 6.0721 - sparse_categorical_accuracy: 0.0741
Epoch 38/40
45056/45056 [==============================] - 3s 67us/sample - loss: 6.0677 - sparse_categorical_accuracy: 0.0806
Epoch 39/40
45056/45056 [==============================] - 3s 72us/sample - loss: 6.0636 - sparse_categorical_accuracy: 0.1049
Epoch 40/40
45056/45056 [==============================] - 3s 70us/sample - loss: 6.0592 - sparse_categorical_accuracy: 0.1051

输出可能会有所不同,但是损失函数会继续发挥作用,即使具有较大的历元值也无法收敛到正确的值,相反,它会不断地上下波动。

我尝试更改学习率,衰减率和梯度限幅参数,但没有成功。 在Pytorch here中实现了相同的模型,结果更好,没有任何损失。

对此有任何见识将不胜感激:)

0 个答案:

没有答案
相关问题