我正在尝试创建一个LSTM
模型,该模型是否可以购买二进制输出。我有以下格式的数据:[date_time, close, volume]
,其中包含数百万行。我一直坚持将数据格式化为3D格式;样本,时间步长,功能。
我曾经用熊猫来读取数据。我想对其进行格式化,这样我就可以得到4000个样本,每个样本具有400个时间步长,并且具有两个功能(关闭和音量)。有人可以建议如何做吗?
编辑: 我正在按照建议使用TimeseriesGenerator,但是我不确定如何检查序列并将输出Y替换为自己的二进制购买输出。
df = normalize_data(df)
print("Creating sequences for NN \n")
targets = df.drop('date_time', 1)
train = keras.preprocessing.sequence.TimeseriesGenerator(df, targets, 1, sampling_rate=1, stride=1,
start_index=0, end_index=int(len(df.index)*0.8),
shuffle=True, reverse=False, batch_size=time_steps)
这正在正常运行,但是现在输出是输入时间序列之后的第一个关闭值。
编辑2: 到目前为止,我的代码如下:
df = data.normalize_data(df)
targets = df.iloc[:, 3] # Buy signal target
df.drop('y1', axis=1, inplace=True)
df.drop('y2', axis=1, inplace=True)
train = TimeseriesGenerator(df, targets, length=1, sampling_rate=1, stride=1,
start_index=0, end_index=int(len(df.index) * 0.8),
shuffle=True, reverse=False, batch_size=time_steps)
# number of samples
print("Samples: " + str(len(train)))
x, y = train[0]
print(str(x))
输出如下:
Samples: 8
Traceback (most recent call last):
File "/home/stian/.local/lib/python3.6/site-
packages/pandas/core/indexes/base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in
pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: range(418, 419)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./main.py", line 94, in <module>
data_menu()
File "./main.py", line 42, in data_menu
data_menu()
File "./main.py", line 56, in data_menu
nn_menu()
File "./main.py", line 76, in nn_menu
nn.nn_gen(pre_processed_data)
File "/home/stian/git/stian9k/nn.py", line 33, in nn_gen
x, y = train[0]
File "/home/stian/.local/lib/python3.6/site-packages/keras_preprocessing/sequence.py", line 378, in __getitem__
samples[j] = self.data[indices]
File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/home/stian/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: range(418, 419)
因此,即使我很难从生成器中获得8个对象,也无法查找它们。如果我测试类型:print(str(type(train)))我得到TimeseriesGenerator对象。任何建议都会再次受到赞赏。
编辑3: 事实证明,timeseriesgenerator不喜欢熊猫数据框。通过转换为numpy数组以及将pandas时间戳类型转换为float可以解决该问题。
答案 0 :(得分:1)
您可以为此简单地使用Keras TimeseriesGenerator。您可以轻松设置长度(即每个样本中的时间步长),采样率和跨度以对数据进行子采样。
它将返回Sequence
类的实例,然后您可以将其传递给fit_generator
以使模型适合其生成的数据。我强烈建议阅读文档,以获得有关此类,其参数及其用法的更多信息。
答案 1 :(得分:1)
谢谢!我从数据帧中得到了很多疯狂的数字。在使用前将其转换为to_numpy()即可解决问题!
input_convertido = df.to_numpy()
output_convertido = df["close"].to_numpy()
gerador = TimeseriesGenerator(input_convertido, output_convertido, length=n_input, batch_size=1, sampling_rate=1)