Question

我正在使用kaggle数据集：https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results/version/2，使用LSTM训练我们的模型，以根据一个国家过去几年在各种体育运动中的表现来预测奖牌总数。

我已使用groupby函数将输入数据帧转换为列表列表。

[
[  [ [country_A], [year_1], [attr_],...[attr_x]  ], [  [country_A], [year_2], [attr_],...[attr_x]  ], ... [ [country_A], [year_t], [attr_],...[attr_x] ]  ],
[  [ [country_B], [year_1], [attr_],...[attr_x]  ], [  [country_B, [year_2], [attr_],...[attr_x]  ], ... [ [country_B], [year_t], [attr_],...[attr_x] ]  ],
.
.
.
[  [ [country_Z], [year_1], [attr_],...[attr_x]  ], [  [country_Z], [year_2], [attr_],...[attr_x]  ], ... [ [country_Z], [year_t], [attr_],...[attr_x] ]  ]
]

shape [country_A] ：( 14，7）

shape [country_B] ：( 25，7）

shape [country_C] ：( 100，7）

但是，问题在于，每个国家/地区对应的行数不同。所有条目的属性数量（年，运动，年龄，身高...）均相同。

例如：“阿富汗”：[14行]，“印度”：[25行]，“美国”：[100行]，依此类推。

我探索了填充选项，该选项似乎仅对缺少的属性有效。我该如何解决可变行的问题？

到目前为止的实现（python：keras.layers.LSTM）-

def lstm_classifier(final_data):
    # reshape
    final_X = final_data.groupby("NOC", as_index=True)['Year', 'Sex', 'Age', 'Height', 'Weight', 'Host_Country', 'Sport'].apply(lambda x: x.values.tolist())
    final_Y = final_data.groupby("NOC", as_index=True)['Medal'].apply(lambda x: x.values.tolist())

    # define model - 10 hidden nodes
    model = Sequential()
    model.add(LSTM(10, stateful = True, input_shape = (1, final_X), return_sequences = True))
    model.add(Dense(4, activation = 'sigmoid'))
    model.summary()
    model.compile(optimizer = 'adam', loss = 'mean_squared_error', metrics = ['accuracy'])

    # fit network
    history = model.fit(final_X, final_Y, epochs = 10, batch_size = 50)

    loss, accuracy = model.evaluate(final_X, final_Y)

不确定我是否正确构建了“ input_shape”参数。关于如何继续前进的任何指示都将有所帮助。

LSTM模型-重塑变量序列的输入

0 个答案: