如何为多对一二进制分类LSTM准备数据?

时间:2019-02-13 12:18:49

标签: keras time-series classification lstm many-to-one

我有一个针对38,000个不同患者的时间序列数据集,其中包含其具有30个特征的48小时生理数据,因此每位患者在48日末具有48行(每小时)和二进制结果(0/1)仅一个小时,总训练集为(38,000*48 = 1,824,000)行。

据我了解,这是Many-to-one LSTM binary classification,所以我的输入形状应该是(38,000,48,30) (sample_size, time_steps, features),并且return_sequence是否应该设置为False才能仅返回最后一个隐藏神经元的输出?

有人可以回顾一下我对此的理解吗?

谢谢。

2 个答案:

答案 0 :(得分:1)

是的,您基本上是对的:

  • 输入形状= (patients, 48, 30)
  • 目标形状= (patients, 1)

您应在最后 LSTM层中使用return_sequences=False。 (如果您在上一个LSTM之前有更多的循环层,请在其中保留return_sequences=True

答案 1 :(得分:0)

是的,大多数情况下您都处于正确的轨道上。请参阅下面的代码,以更好地理解这一点。

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Bidirectional
from keras.metrics import binary_crossentropy

# vocab size
total_features = 30
no_of_pateints = 38,000
time_steps = 48


model = Sequential()

# you can also use Bidirectional layer to speed up the learning and reduce 
# training time and here you can keep return_sequence as true
# model.add(
    Bidirectional(LSTM(
        units=100, 
        input_shape=(no_of_patients, time_steps, total_features), 
        return_sequences=True
    )))
# return_sequence should be False if there is only one LSTM layer. Otherwise in case of multiple layers, 
the last layers should have return_sequence as False
model.add(LSTM(
    units=100, 
    input_shape=(no_of_patients, time_steps, total_features), 
    return_sequences=False 
    ))
model.add(Dense(2, activation='softmax'))
model.compile(
    loss=binary_crossentropy,
    optimizer='rmsprop',
    metrics=['accuracy']
)

如果您对以上代码有任何疑问或需要更多说明,请告诉我