Question

样本数据集包含用户的位置点。

df.head()

   user           tslot         Location_point
0   0   2015-12-04 13:00:00     4356
1   0   2015-12-04 13:15:00     4356
2   0   2015-12-04 13:30:00     3659
3   0   2015-12-04 13:45:00     4356
4   0   2015-12-04 14:00:00     8563

df.shape 

(288,3)

由于位置点是分类值，因此它们是一种热编码。

encoded = to_categorical(df['Location_point'])

编码值如下

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]

编码值的形状为（288,8564）。

我试图塑造训练数据

X_trai = []
y_trai = []
for i in range(96, 288):
    X_trai.append(encoded[i-96:i])
    y_trai.append(encoded[i])
X_trai, y_trai = np.array(X_trai), np.array(y_trai)

X_trai = np.reshape(X_trai, (X_trai.shape[0], X_trai.shape[1], 1))

模型是

regressor = Sequential()

regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_trai.shape[1], 1)))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50))
regressor.add(Dropout(0.2))

regressor.add(Dense(units = 1))

regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')

regressor.fit(X_trai, y_trai, epochs = 100, batch_size = 32)

这不是正确的模型。我是深度学习的新手。我尝试查看一些示例，但无法理解一种热编码。如果有人可以解释输入形状，输出形状和正确的模型，我将不胜感激。

The input is the sequence of the location points and the output is to predict
 the next location point for that user.

Answer 1

输入形状取决于您的数据，如果您有一个具有288个时间步长和8564个特征的样本，则输入形状将为（batch_size = 1，timesteps = 288，n_features = 8564），如果您有288个时间步长单个时间步长为（batch_size = 288，timesteps = 1，n_features = 8564）。

无论如何，这里有一个有关如何为LSTM模型准备数据的教程。 https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/ https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/

LSTM的输入形状如下：

具有形状（batch_size，时间步长，input_dim）的3D张量，（可选）具有形状（batch_size，output_dim）的2D张量。

时间步长将是您的时间序列的长度， input_dim 是您具有的特征的数量，在这种情况下，由于它们是一种热编码，因此将为8564。< / p>

输出形状将取决于模型的体系结构。

第一层为您提供（batch_size，时间步长，单位）的输出
第二层：（批处理大小，时间步长，单位）
第三层：（批量大小，单位）
最后一层：（batch_size，1）

不过，您可以使用以下方法检查模型的输入/输出形状：

regressor.input_shape和regressor.output_shape

最后，为什么不将Location_point视为数字变量？

具有一个热编码数据的LSTM的输入形状

1 个答案: