Question

我知道LSTM适用于预测时间序列中的值。但是，我使用的时间序列基本上是表格格式，有21392行，每行通常是一个间隔为10分钟的实例。
因此，每行包含大约1973个要素，这些要素是在特定时间戳记下不同要素的度量。我想预测句子（目标是句子，所以每一行的最后一列是特定的句子，例如“猫坐在那里”，等等。）
因此，基本上，我们可以将其视为一个从序列到序列的场景，其中，我的数据的每一行都是一个序列，我想用它来预测目标序列（在这种情况下是一个句子）。

由于我所有功能的比例都在变化，因此我认为对当前比例进行缩放非常重要。我对目标（句子）进行了一次热编码，以方便在潜在网络中使用它们。但是，我对如何实际使用输入和输出来适合我的情况感到非常困惑。
我已经检查了Keras的LSTM文档和机器学习精通等博客，但找不到任何与使用数字功能预测句子有关的资源。这是我目前正在使用的代码：-

df = pd.read_pickle('all_data_witheventdescription.pkl') # read data frame
df.head()
df_features = df.iloc[:,:-3] #these are the predictors (numeric inputs)
outputs_df = df.iloc[:,-1] # these are basically sentences
outputs_df = outputs_df.values # converting the sentences to an array

# center and scale
print("Center and Scaling taking place....")
scaler = MinMaxScaler(feature_range=(0, 1))
df_features = scaler.fit_transform(df_features)

# # one-hot encode the outputs
print("One hot encoding the outputs for training....")
onehot_encoder = OneHotEncoder()
encode_categorical = outputs_df.reshape(len((outputs_df)), 1)
outputs_encoded = onehot_encoder.fit_transform(encode_categorical).toarray()
print('outputs_encoded.shape after One Hot Encode:', outputs_encoded.shape)

如果有人可以用一个最小的可行示例来指导我，以便在这种情况下可以利用LSTM来预测句子，那么将不胜感激。

如何使用LSTM预测句子，但以数字特征作为预测变量？

0 个答案: