我有一个使用以下行生成的986个数组的列表,仅选择Weeks
weeks = [g for n, g in m15.groupby(pd.Grouper(key='datebuff',freq='W'))]
我需要两个实现两个目标。首先,准备X_train和y_train,X_train将成为每个数组,从结尾开始不包括窗口大小。 y_train是窗口大小到每个数组末尾的偏移量。 仍在研究
第二,有没有办法,最重要的是,有没有办法在每个数组上像进料一样分别训练LSTM?是有状态的循环吗?
[ datebuff High Low Open Close
0 2000-01-03 00:00:00 1.0080 1.0073 1.0073 1.0077
1 2000-01-03 00:15:00 1.0087 1.0076 1.0078 1.0086
2 2000-01-03 00:30:00 1.0089 1.0079 1.0087 1.0079
3 2000-01-03 00:45:00 1.0132 1.0078 1.0078 1.0128
4 2000-01-03 01:00:00 1.0133 1.0120 1.0129 1.0122
5 2000-01-03 01:15:00 1.0125 1.0120 1.0123 1.0124
6 2000-01-03 01:30:00 1.0137 1.0129 1.0132 1.0133
7 2000-01-03 01:45:00 1.0141 1.0133 1.0135 1.0137
8 2000-01-03 02:00:00 1.0145 1.0134 1.0140 1.0138
9 2000-01-03 02:15:00 1.0142 1.0135 1.0135 1.0141
10 2000-01-03 02:30:00 1.0147 1.0137 1.0142 1.0145
11 2000-01-03 02:45:00 1.0173 1.0142 1.0142 1.0171
12 2000-01-03 03:00:00 1.0190 1.0170 1.0170 1.0185
13 2000-01-03 03:15:00 1.0189 1.0176 1.0188 1.0180
14 2000-01-03 03:30:00 1.0183 1.0173 1.0180 1.0180
15 2000-01-03 03:45:00 1.0180 1.0173 1.0179 1.0175
16 2000-01-03 04:00:00 1.0177 1.0164 1.0173 1.0171
17 2000-01-03 04:15:00 1.0171 1.0165 1.0167 1.0170
18 2000-01-03 04:30:00 1.0171 1.0166 1.0166 1.0168
19 2000-01-03 04:45:00 1.0171 1.0166 1.0167 1.0167
20 2000-01-03 05:00:00 1.0171 1.0165 1.0171 1.0165
21 2000-01-03 05:15:00 1.0171 1.0164 1.0171 1.0167
22 2000-01-03 05:30:00 1.0168 1.0165 1.0168 1.0167
23 2000-01-03 05:45:00 1.0172 1.0166 1.0166 1.0169
24 2000-01-03 06:00:00 1.0180 1.0167 1.0171 1.0178
25 2000-01-03 06:15:00 1.0181 1.0175 1.0181 1.0175
26 2000-01-03 06:30:00 1.0179 1.0172 1.0177 1.0172
27 2000-01-03 06:45:00 1.0177 1.0170 1.0173 1.0170
28 2000-01-03 07:00:00 1.0174 1.0158 1.0174 1.0163
29 2000-01-03 07:15:00 1.0167 1.0160 1.0163 1.0165
.. ... ... ... ... ...
448 2000-01-07 16:30:00 1.0313 1.0299 1.0306 1.0299
449 2000-01-07 16:45:00 1.0307 1.0288 1.0298 1.0292
450 2000-01-07 17:00:00 1.0305 1.0284 1.0293 1.0304
451 2000-01-07 17:15:00 1.0306 1.0288 1.0304 1.0297
452 2000-01-07 17:30:00 1.0306 1.0294 1.0295 1.0303
453 2000-01-07 17:45:00 1.0306 1.0293 1.0303 1.0293
454 2000-01-07 18:00:00 1.0299 1.0289 1.0299 1.0293
455 2000-01-07 18:15:00 1.0299 1.0292 1.0295 1.0297
456 2000-01-07 18:30:00 1.0300 1.0280 1.0297 1.0280
457 2000-01-07 18:45:00 1.0302 1.0281 1.0283 1.0299
458 2000-01-07 19:00:00 1.0306 1.0293 1.0297 1.0293
459 2000-01-07 19:15:00 1.0299 1.0290 1.0293 1.0292
460 2000-01-07 19:30:00 1.0294 1.0284 1.0292 1.0287
461 2000-01-07 19:45:00 1.0293 1.0285 1.0286 1.0289
462 2000-01-07 20:00:00 1.0289 1.0280 1.0286 1.0283
463 2000-01-07 20:15:00 1.0284 1.0273 1.0279 1.0275
464 2000-01-07 20:30:00 1.0284 1.0275 1.0276 1.0283
465 2000-01-07 20:45:00 1.0289 1.0272 1.0284 1.0281
466 2000-01-07 21:00:00 1.0289 1.0279 1.0280 1.0281
467 2000-01-07 21:15:00 1.0288 1.0276 1.0277 1.0277
468 2000-01-07 21:30:00 1.0290 1.0276 1.0276 1.0287
469 2000-01-07 21:45:00 1.0298 1.0289 1.0292 1.0294
470 2000-01-07 22:00:00 1.0294 1.0287 1.0292 1.0291
471 2000-01-07 22:15:00 1.0292 1.0284 1.0290 1.0287
472 2000-01-07 22:30:00 1.0298 1.0289 1.0289 1.0296
473 2000-01-07 22:45:00 1.0299 1.0295 1.0295 1.0295
474 2000-01-07 23:00:00 1.0297 1.0292 1.0292 1.0294
475 2000-01-07 23:15:00 1.0298 1.0294 1.0298 1.0297
476 2000-01-07 23:30:00 1.0299 1.0299 1.0299 1.0299
477 2000-01-07 23:45:00 1.0300 1.0298 1.0300 1.0298
[478 rows x 5 columns],
datebuff High Low Open Close
478 2000-01-10 00:00:00 1.0298 1.0286 1.0286 1.0298
479 2000-01-10 00:15:00 1.0300 1.0292 1.0297 1.0300
480 2000-01-10 00:30:00 1.0301 1.0300 1.0301 1.0300
481 2000-01-10 00:45:00 1.0305 1.0297 1.0305 1.0297
482 2000-01-10 01:00:00 1.0298 1.0289 1.0295 1.0291
483 2000-01-10 01:15:00 1.0302 1.0289 1.0294 1.0301
484 2000-01-10 01:30:00 1.0302 1.0287 1.0300 1.0291
485 2000-01-10 01:45:00 1.0293 1.0282 1.0290 1.0283
486 2000-01-10 02:00:00 1.0287 1.0274 1.0282 1.0284
487 2000-01-10 02:15:00 1.0292 1.0280 1.0283 1.0284
488 2000-01-10 02:30:00 1.0290 1.0284 1.0287 1.0286
489 2000-01-10 02:45:00 1.0292 1.0285 1.0287 1.0289
490 2000-01-10 03:00:00 1.0291 1.0283 1.0291 1.0289
491 2000-01-10 03:15:00 1.0292 1.0285 1.0291 1.0286
492 2000-01-10 03:30:00 1.0292 1.0283 1.0289 1.0289
493 2000-01-10 03:45:00 1.0301 1.0288 1.0290 1.0289
494 2000-01-10 04:00:00 1.0295 1.0289 1.0290 1.0290
495 2000-01-10 04:15:00 1.0298 1.0289 1.0291 1.0292
496 2000-01-10 04:30:00 1.0295 1.0292 1.0293 1.0293
497 2000-01-10 04:45:00 1.0295 1.0292 1.0294 1.0293
498 2000-01-10 05:00:00 1.0295 1.0292 1.0292 1.0292
499 2000-01-10 05:15:00 1.0294 1.0290 1.0291 1.0292
500 2000-01-10 05:30:00 1.0294 1.0290 1.0294 1.0292
501 2000-01-10 05:45:00 1.0294 1.0287 1.0294 1.0290
502 2000-01-10 06:00:00 1.0292 1.0287 1.0287 1.0289
503 2000-01-10 06:15:00 1.0293 1.0287 1.0293 1.0292
504 2000-01-10 06:30:00 1.0298 1.0290 1.0291 1.0292
505 2000-01-10 06:45:00 1.0295 1.0291 1.0293 1.0295
506 2000-01-10 07:00:00 1.0297 1.0289 1.0294 1.0291
507 2000-01-10 07:15:00 1.0298 1.0285 1.0292 1.0289
.. ... ... ... ... ...
927 2000-01-14 16:30:00 1.0142 1.0121 1.0130 1.0136
928 2000-01-14 16:45:00 1.0140 1.0115 1.0135 1.0117
929 2000-01-14 17:00:00 1.0134 1.0114 1.0117 1.0130
930 2000-01-14 17:15:00 1.0138 1.0125 1.0129 1.0137
931 2000-01-14 17:30:00 1.0144 1.0132 1.0135 1.0144
932 2000-01-14 17:45:00 1.0146 1.0125 1.0142 1.0128
933 2000-01-14 18:00:00 1.0152 1.0126 1.0128 1.0152
934 2000-01-14 18:15:00 1.0156 1.0141 1.0152 1.0145
935 2000-01-14 18:30:00 1.0147 1.0137 1.0144 1.0139
936 2000-01-14 18:45:00 1.0150 1.0139 1.0139 1.0143
937 2000-01-14 19:00:00 1.0148 1.0133 1.0143 1.0146
938 2000-01-14 19:15:00 1.0151 1.0145 1.0147 1.0149
939 2000-01-14 19:30:00 1.0152 1.0146 1.0149 1.0150
940 2000-01-14 19:45:00 1.0156 1.0145 1.0149 1.0154
941 2000-01-14 20:00:00 1.0157 1.0140 1.0154 1.0141
942 2000-01-14 20:15:00 1.0147 1.0138 1.0141 1.0142
943 2000-01-14 20:30:00 1.0147 1.0141 1.0142 1.0145
944 2000-01-14 20:45:00 1.0157 1.0143 1.0146 1.0157
945 2000-01-14 21:00:00 1.0155 1.0144 1.0154 1.0146
946 2000-01-14 21:15:00 1.0147 1.0144 1.0145 1.0144
947 2000-01-14 21:30:00 1.0145 1.0141 1.0145 1.0142
948 2000-01-14 21:45:00 1.0145 1.0140 1.0141 1.0145
949 2000-01-14 22:00:00 1.0145 1.0135 1.0144 1.0138
950 2000-01-14 22:15:00 1.0140 1.0127 1.0140 1.0134
951 2000-01-14 22:30:00 1.0132 1.0122 1.0131 1.0124
952 2000-01-14 22:45:00 1.0124 1.0121 1.0122 1.0121
953 2000-01-14 23:00:00 1.0133 1.0122 1.0122 1.0125
954 2000-01-14 23:15:00 1.0128 1.0116 1.0128 1.0118
955 2000-01-14 23:30:00 1.0122 1.0118 1.0119 1.0119
956 2000-01-14 23:45:00 1.0123 1.0120 1.0121 1.0122
[479 rows x 5 columns],
datebuff High Low Open Close
957 2000-01-17 00:00:00 1.0131 1.0127 1.0129 1.0127
958 2000-01-17 00:15:00 1.0127 1.0124 1.0126 1.0124
959 2000-01-17 00:30:00 1.0124 1.0121 1.0123 1.0122
960 2000-01-17 00:45:00 1.0122 1.0120 1.0121 1.0122
961 2000-01-17 01:00:00 1.0131 1.0123 1.0123 1.0129
962 2000-01-17 01:15:00 1.0138 1.0130 1.0133 1.0134
963 2000-01-17 01:30:00 1.0137 1.0131 1.0134 1.0136
964 2000-01-17 01:45:00 1.0140 1.0129 1.0137 1.0129
965 2000-01-17 02:00:00 1.0135 1.0127 1.0128 1.0132
966 2000-01-17 02:15:00 1.0134 1.0131 1.0133 1.0131
967 2000-01-17 02:30:00 1.0135 1.0128 1.0130 1.0134
968 2000-01-17 02:45:00 1.0137 1.0133 1.0134 1.0135
969 2000-01-17 03:00:00 1.0136 1.0131 1.0133 1.0131
970 2000-01-17 03:15:00 1.0137 1.0130 1.0133 1.0137
971 2000-01-17 03:30:00 1.0140 1.0132 1.0140 1.0132
972 2000-01-17 03:45:00 1.0135 1.0132 1.0135 1.0134
973 2000-01-17 04:00:00 1.0136 1.0130 1.0136 1.0133
974 2000-01-17 04:15:00 1.0134 1.0132 1.0134 1.0133
975 2000-01-17 04:30:00 1.0135 1.0131 1.0132 1.0133
976 2000-01-17 04:45:00 1.0134 1.0132 1.0132 1.0132
977 2000-01-17 05:00:00 1.0136 1.0132 1.0134 1.0133
978 2000-01-17 05:15:00 1.0136 1.0132 1.0132 1.0133
979 2000-01-17 05:30:00 1.0136 1.0132 1.0136 1.0134
980 2000-01-17 05:45:00 1.0137 1.0134 1.0136 1.0134
981 2000-01-17 06:00:00 1.0136 1.0132 1.0134 1.0133
982 2000-01-17 06:15:00 1.0138 1.0130 1.0134 1.0137
983 2000-01-17 06:30:00 1.0142 1.0137 1.0138 1.0140
984 2000-01-17 06:45:00 1.0145 1.0138 1.0140 1.0141
985 2000-01-17 07:00:00 1.0148 1.0139 1.0142 1.0140
986 2000-01-17 07:15:00 1.0145 1.0134 1.0139 1.0138
... ... ... ... ... ...
1407 2000-01-21 16:30:00 1.0080 1.0061 1.0075 1.0074
1408 2000-01-21 16:45:00 1.0083 1.0066 1.0076 1.0070
1409 2000-01-21 17:00:00 1.0080 1.0065 1.0071 1.0068
1410 2000-01-21 17:15:00 1.0079 1.0065 1.0068 1.0069
1411 2000-01-21 17:30:00 1.0080 1.0067 1.0069 1.0071
1412 2000-01-21 17:45:00 1.0095 1.0070 1.0070 1.0094
1413 2000-01-21 18:00:00 1.0102 1.0086 1.0095 1.0089
1414 2000-01-21 18:15:00 1.0091 1.0079 1.0089 1.0086
1415 2000-01-21 18:30:00 1.0094 1.0083 1.0085 1.0085
1416 2000-01-21 18:45:00 1.0096 1.0085 1.0085 1.0092
1417 2000-01-21 19:00:00 1.0095 1.0083 1.0091 1.0087
1418 2000-01-21 19:15:00 1.0100 1.0085 1.0085 1.0095
1419 2000-01-21 19:30:00 1.0102 1.0093 1.0096 1.0101
1420 2000-01-21 19:45:00 1.0103 1.0096 1.0101 1.0096
1421 2000-01-21 20:00:00 1.0099 1.0090 1.0095 1.0097
1422 2000-01-21 20:15:00 1.0101 1.0093 1.0098 1.0095
1423 2000-01-21 20:30:00 1.0099 1.0092 1.0094 1.0095
1424 2000-01-21 20:45:00 1.0103 1.0093 1.0094 1.0103
1425 2000-01-21 21:00:00 1.0100 1.0092 1.0098 1.0096
1426 2000-01-21 21:15:00 1.0100 1.0092 1.0096 1.0098
1427 2000-01-21 21:30:00 1.0099 1.0093 1.0094 1.0093
1428 2000-01-21 21:45:00 1.0097 1.0088 1.0092 1.0088
1429 2000-01-21 22:00:00 1.0091 1.0084 1.0087 1.0088
1430 2000-01-21 22:15:00 1.0094 1.0086 1.0091 1.0088
1431 2000-01-21 22:30:00 1.0092 1.0084 1.0085 1.0085
1432 2000-01-21 22:45:00 1.0094 1.0086 1.0088 1.0094
1433 2000-01-21 23:00:00 1.0097 1.0089 1.0097 1.0091
1434 2000-01-21 23:15:00 1.0101 1.0090 1.0090 1.0101
1435 2000-01-21 23:30:00 1.0089 1.0089 1.0089 1.0089
1436 2000-01-21 23:45:00 1.0087 1.0087 1.0087 1.0087
[480 rows x 5 columns],
更新1
对于第一点,我尝试了以下代码,但是我一直在获取“ ValueError:所有输入数组必须具有相同数量的维数”
周是986个数组,每个数组约为478-480,窗口大小为10
X = np.array([])
y = np.array([])
for i in range(0, len(weeks)):
z = weeks[i]
z = np.asarray(z)
z = z[:,[1,2,3,4,5]]
X = np.append(X,[np.atleast_3d(np.array([z[index:index + window_size] for index in range(0, z.shape[0] - window_size)]))],axis=0) # The inputs to a predictor
y = np.append(y,[z[window_size:,[4]]],axis=0)
更新2
最后通过使用python的列表而不是numpy数组解决了第一个问题。
window_size = 10
X = []
y = []
for i in range(0, len(weeks)):
z = weeks[i]
z = np.asarray(z)
z = z[:,[1,2,3,4,5]]
X.append([np.atleast_3d(np.array([z[index:index + window_size] for index in range(0, z.shape[0] - window_size)]))]) # The inputs to a predictor
y.append([z[window_size:,[4]]])
根据keras文档,似乎我正在寻找有状态的东西:
有状态:布尔值(默认为False)。如果为True,则将批次中索引i的每个样本的最后状态用作下一个批次中索引i的初始状态。
更新3
现在,我正在尝试使用以下模型-使用有状态-问题在于设置batch_input_shape,因为它需要批处理大小,我需要在每次有状态迭代中进行更改,否则会出现以下错误:
InvalidArgumentError: Incompatible shapes: [276,1] vs. [277,1]
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
neurons = 50
batch_size = len(X)
window_size = X[0][0].shape[1]
nb_features = X[0][0].shape[2]
model = Sequential()
model.add(LSTM(neurons, batch_input_shape=(len(X[0][0]), window_size, nb_features), stateful=True))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam', metrics=['mae'])
for i in range(batch_size):
model.fit(X[i], y[i], epochs=1, batch_size=len(X[i][0]), verbose=0, shuffle=False)