Question

我一直在与LSTM合作一段时间，我想我已经掌握了主要概念。我一直在尝试使用Keras环境一段时间，这样我就可以更好地了解LSTM的工作原理，因此我决定训练一个神经网络来识别MNIST数据集。

我知道当我训练LSTM时，我应该给出一个张量作为输入（样本数，时间步长，特征）。我将图像从28x28重新塑造为784个元素（1x784）的单个矢量，然后我将input_shape =（60000,1,784）。最后我试图改变时间步数，我的新input_shape变为（60000,16,49）。

我不明白为什么当我改变特征向量从784变为49的时间步数时。我想我并不真正理解LSTM中时间步的概念。你能解释一下吗？可能是指这个特例？此外，当我增加时间步长时，精度会降低，为什么会这样呢？不应该更高吗？谢谢。

修改

from __future__ import print_function
import numpy as np
import struct
from keras.models import Sequential
from keras.layers import Dense, LSTM, Activation
from keras.utils import np_utils
train_im = open('train-images-idx3-ubyte','rb')
train_la = open('train-labels-idx1-ubyte','rb')
test_im = open('t10k-images-idx3-ubyte','rb')
test_la = open('t10k-labels-idx1-ubyte','rb')

##training images and labels

magic,num_ima = struct.unpack('>II', train_im.read(8))
rows,columns = struct.unpack('>II', train_im.read(8))
img = np.fromfile(train_im,dtype=np.uint8).reshape(rows*columns, num_ima) #784*60000

magic_l, num_l = struct.unpack('>II', train_la.read(8))
lab = np.fromfile(train_la, dtype=np.int8) #1*60000

## test images and labels

magic, num_test = struct.unpack('>II', test_im.read(8))
rows,columns = struct.unpack('>II', test_im.read(8))
img_test = np.fromfile(test_im,dtype=np.uint8).reshape(rows*columns, num_test) #784x10000

magic_l, num_l = struct.unpack('>II', test_la.read(8))
lab_test = np.fromfile(test_la, dtype=np.int8) #1*10000

batch = 50
epoch=15
hidden_units = 10
classes = 1
a, b = img.T.shape[0:]

img = img.reshape(img.T.shape[0],-1,784)
img_test = img_test.reshape(img_test.T.shape[0],-1,784)
lab = np_utils.to_categorical(lab, 10)
lab_test = np_utils.to_categorical(lab_test, 10)
print(img.shape[0:])
model = Sequential()
model.add(LSTM(40,input_shape =img.shape[1:], batch_size = batch))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer = 'RMSprop', loss='mean_squared_error', metrics = ['accuracy'])
model.fit(img, lab, batch_size = batch,epochs=epoch,verbose=1)


scores = model.evaluate(img_test, lab_test, batch_size=batch)
predictions = model.predict(img_test, batch_size = batch)
print('LSTM test score:', scores[0])
print('LSTM test accuracy:', scores[1])

编辑2 非常感谢，当我这样做时，我收到以下错误：

ValueError: Input arrays should have the same number of samples as target arrays. Found 3750 input samples and 60000 target samples.

我知道我应该重塑输出，但我不知道应该有什么形状。

Answer 1

时间步长表示从视频中提取的帧的状态。传递给LSTM的输入形状应采用（num_samples，timesteps，input_dim）形式。如果您需要16个时间步，则应将数据重新整形为（num_samples // timesteps，timesteps，input_dims）

URLSessionConfiguration

因此，如果您的batch_size = 50，它将一次传递50 * 16张图像。现在，当您保持num_samples不变时，它会拆分您的input_dims。

修改目标数组将具有与num_samples相同的形状，即在您的情况下为3750。所有时间步骤将共享相同的标签。你必须决定用这些MNIST序列做什么。您当前的模型将这些序列（不是数字）分为10个类。

LSTM MNIST数据集中的特征和时间步骤

1 个答案: