是否可以在循环内并行训练Keras模型?

时间:2020-04-22 17:04:17

标签: python multithreading keras multiprocessing tf.keras

我的问题流有点像这样。我想并行训练两个独立的模型(不必并行训练:但是并行训练会在新的一批传入数据上按规则的时间间隔(为简单起见,这里称为循环)显着加快我在实际问题上的循环。

问题的简化流程如下所示。我想知道Keras模型(1和2)的单独训练是否可以并行执行,因为它们彼此不依赖。

import numpy as np
from keras.models import Model
from keras.layers import Input, Dense, LSTM

"""ignore following 3 imports as they just illustrate in case someone wants to know
why I need to train in loops 
"""
# from stable_baselines.common.vec_env import SubprocVecEnv
# from stable_baselines.common import set_global_seeds, make_vec_env
# from mycustomfile import SomeImportedEnv, myRL_Agent


# representative data source coming in batches after regular 
# intervals of time for training model 1
def new_data_source_1():

    # generate test and train data for model 1
    X_train_model1 = np.random.rand(100,1,3)
    y_train_model1 = np.random.rand(100,1,2)
    X_test_model1 = np.random.rand(100,1,3)
    y_test_model1 = np.random.rand(100,1,2)

    return [X_train_model1, y_train_model1, X_test_model1,y_test_model1]


# representative data source coming in batches after regular 
# intervals of time for training model 2
def new_data_source_2():

    # generate test and train data for model 2
    X_train_model2 = np.random.rand(60,1,5)
    y_train_model2 = np.random.rand(60,1,2)
    X_test_model2 = np.random.rand(60,1,5)
    y_test_model2 = np.random.rand(60,1,2)

    return [X_train_model2, y_train_model2, X_test_model2, y_test_model2]


def create_keras_model(input_dim, outputdim, input_seq_length, output_seq_length):

    assert input_seq_length==output_seq_length, "This model can take input
       and output sequence of equal length only"

    input_layer = Input(batch_shape=(None, input_seq_length, input_dim))
    hidden_layer = LSTM(16, return_sequences = True)(input_layer)
    output_layer = Dense(outputdim, activation='relu')(hidden_layer)

    model = Model(inputs=input_layer, outputs=output_layer)

    model.compile(loss='mse', optimizer='adam')

    return model


"""What I am doing currently to train inside a loop, 2 Keras models, sequentially.
I want to make the model trainings parallel to save time."""
def main():

    model1_created = False
    model2_created = False

    """consider I have to train 10 incoming batches of data from source 1 and 2
    and I have to keep updating my 2 models regularly for subsequent use in a OpenAI Gym 
    environment"""
    for i in range(10):

        # get data from source 1
        X_train_model1, y_train_model1, X_test_model1,y_test_model1 = new_data_source_1()

        # create model for source 1
        if not model1_created:
            model1 = create_keras_model(X_train_model1.shape[-1], y_train_model1.shape[-1][-1],
                                    X_train_model1.shape[-2], y_train_model1.shape[-1][-2])
            model1_created = True

        # train the model1
        _ = model1.fit(X_train_model1, y_train_model1, epochs=100,  initial_epoch=0, 
                             batch_size=32, validation_data=(X_test_model1, y_test_model1))

        # save model1 weights for future use; reason: model1 is needed for 
        # subsequent use in another multiprocessing environment(see StableBaselines 
        # SubprocVecEnv) inside this main function but using the model1 directly 
        # leads to error
        model1.save('Latest_Model_1.hdf5')


        # get data from source 2
        X_train_model2, y_train_model2, X_test_model2,y_test_model2 = new_data_source_2()

        # create model for source 2
        if not model2_created:
            model2 = create_keras_model(X_train_model2.shape[-1], y_train_model2.shape[-1][-1],
                                    X_train_model2.shape[-2], y_train_model2.shape[-1][-2])
            model2_created = True

        # train the model2
        _ = model2.fit(X_train_model1, y_train_model1, epochs=100, initial_epoch=0, 
                             batch_size=32, validation_data=(X_test_model2, y_test_model2))

        # save model2 weights for future use; reason: model2 is needed for 
        # subsequent use in another multiprocessing environment(see StableBaselines 
        # SubprocVecEnv) inside this main function but using the model2 directly 
        # leads to error
        model2.save('Latest_Model_2.hdf5')

        """The following section is an informal representation of why I need
        the models 1 and 2; it is not related to solving my question. You can simply ignore 
        this from the main function in case you want to.
        I have seen people sometimes ask the rationale behind doing training in loops. This 
        entire main() function (with more augmentations above and below) will be running in 
        actual deployment without terminating.
        """
        # mycustomGymenv = SomeImportedEnv(model1path = 'Latest_Model_1.hdf5',
        #                                  model1path = 'Latest_Model_2.hdf5')
        # state = initial_state = mycustomGymenv.initial_state()
        # agent = myRL_Agent(mycustomGymenv)
        # for _ in range(100): # simulate for 100 time steps 
              # action = agent(state) # agent takes action
              # state, reward, over, etc = mycustomGymenv.simulate()  # env returns feedback
              # state, action - save to csv file etc,

if __name__ == '__main__':
    main()
    print("Done!")        

欢迎您提出任何有关改善问题清晰度的建议/修改。

1 个答案:

答案 0 :(得分:0)

如果您计划按顺序而不是并行地拟合数据块,则可以执行以下操作:

for _ in range(10):
#somehow cut the data into slices and fit them one by one
    model.fit(data_slice, label_slice ......)

连续调用 fit 会逐步训练模型。