Question

我正在尝试在具有500个时间戳和6M序列的顺序数据上构建LSTM模型。由于硬件配置的限制，我无法将整个数据转换为numpy数组。在kears中，如果我以块的形式创建数据并为模型提供数据，那是否可以。

以下是我正在使用的方法。

For epoch in range(10):
    While I<6000000:
        Data1=np.array(datax[I:I+100000])
        Data2=np.array(datay[I:I+100000])
        Model.fit(Data1, Data2, epochs=1, batch_size=100)
        I=I+100000

这种方法是否正确？

Answer 1

是的，这种方法还可以。

您还可以为该任务创建一个生成器，并且只使用一个拟合。这可能会减少很多次调用fit的开销。

def dataReader(batch_size):
    while True:             #this line is just because keras needs infinite generators
        while I<6000000:
            Data1=np.array(datax[I:I+batch_size])
            Data2=np.array(datay[I:I+batch_size])

            #you could even load the data partially here from the HD 
            #instead of loading the entire lists datax and datay
            #this will leave you more memory for having bigger models

            yield (Data1,Data2)
            I=I+batch_size

然后使用fit_generator：

batch_size=100
steps = 6000000 // batch_size
Model.fit_generator(dataReader(batch_size), steps_per_epoch=steps,epochs=10,...)

Keras LSTM模型对庞大的数据

1 个答案: