Keras fit_generator:__getitem__方法

时间:2018-02-14 07:56:32

标签: python neural-network keras

我正在使用fit_generator函数来训练我的模型,并希望验证我的数据是否按预期构建和使用。我的类派生自keras.utils.Sequence()实现方法__getitem____len__on_epoch_end,如下所示:

class PairwiseSequence(Sequence):
"""Generator that returns a combination of simulations (over a
parametrizable amount of timesteps) and the corresponding metric distance.

pair_list: List of pairwise combinations of simulations
results: dictionary with results for the metric distance between
         simulation pairs
sim_files: List of filenames representing single timesteps
batch_size: number of samples to process in a single interference run
"""

def __init__(self, pair_list, results, mean, std, train=False, sim_files=None,
             batch_size=1):
    self.pair_list = pair_list
    self.results = results
    self.batch_size = batch_size
    self.sim_files = sim_files
    self.mean = mean
    self.std = std
    self.train = train

def __len__(self):
    return math.ceil(len(self.pair_list) / self.batch_size)

def __getitem__(self, idx):
    dummy = LOADING_METHOD(self.pair_list[0][0], self.sim_files)
    x_1 = np.zeros(tuple([self.batch_size]) + dummy.shape)
    x_2 = np.zeros(tuple([self.batch_size]) + dummy.shape)
    y = np.zeros((self.batch_size, 1))

    if self.train:
        #print((idx * self.batch_size + i) % len(self.pair_list), ',')
        print("training idx:", idx)
    else:
        print("validation idx:", idx)

        for i in range(0, self.batch_size):
        (sim1, sim2) = self.pair_list[(idx * self.batch_size + i) %
                                      len(self.pair_list)]
        x_1[i] = LOADING_METHOD(sim1, self.sim_files) - self.mean[0]
        x_1[i] /= self.std[0]
        x_2[i] = LOADING_METHOD(sim2, self.sim_files) - self.mean[1]
        x_2[i] /= self.std[1]
        y[i] = self.results[frozenset((sim1.ensemble, sim2.ensemble))]
    return [x_1, x_2], y

def on_epoch_end(self):
    if self.train:
        print("training generator: epoch end")
    else:
        print("validation generator: epoch end")
    #random.shuffle(self.pair_list)

此类用作训练和验证数据的生成器(两个单独的实例)。

正如您所看到的,我正在打印idx __getitem__参数history_callback = model.fit_generator( generator=train_gen, steps_per_epoch=len(train_gen), epochs=epochs, verbose=0, callbacks=callbacks, validation_data=valid_gen, validation_steps=len(valid_gen), workers=1, use_multiprocessing=False, shuffle=False ) 以及当一个纪元在控制台结束时的一些通知。我正在调用fit_generator(随着多处理的转换):

idx

我也改变了数据的洗牌方式。通过这种配置,我可以预期on_epoch_end从0变为len(生成器),然后调用idx。我有372个样本用于训练,93个用于验证,batch_size 12 __getitem__应该从0到30(训练数据)分别为0到7(验证数据)。但on_epoch_end的调用次数比我预期的要频繁,并且batch_size: 12 len(train_gen): 31 len(valid_gen): 8 2018-02-14 08:45:09.041929: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX training idx: 0 training idx: 1 training idx: 2 training idx: 3 training idx: 4 training idx: 5 training idx: 6 training idx: 7 training idx: 8 training idx: 9 training idx: 10 training idx: 11 training idx: 12 training idx: 13 training idx: 14 training idx: 15 training idx: 16 training idx: 17 training idx: 18 training idx: 19 training idx: 20 training generator: epoch end training idx: 21 training idx: 22 training idx: 23 training idx: 24 training idx: 25 training idx: 26 training idx: 27 training idx: 28 training idx: 29 training idx: 30 training idx: 0 validation generator: epoch end validation idx: 0 training idx: 1 validation idx: 1 training idx: 2 validation idx: 2 training idx: 3 validation idx: 3 training idx: 4 validation idx: 4 training idx: 5 validation idx: 5 validation generator: epoch end training idx: 6 validation idx: 6 training idx: 7 validation idx: 7 training idx: 8 validation idx: 0 training idx: 9 validation idx: 1 training idx: 10 validation idx: 2 validation idx: 3 validation idx: 4 validation idx: 5 validation idx: 6 validation idx: 7 validation idx: 0 validation idx: 1 validation idx: 2 Epoch 00000: val_loss improved from inf to 10512.69922, saving model to /home/stefan/vcs/MA/code/results/test/TB_dummy_distance_10513.hdf5 training idx: 11 training idx: 12 training idx: 13 training idx: 14 training idx: 15 training idx: 16 training idx: 17 training idx: 18 training idx: 19 training idx: 20 training generator: epoch end training idx: 21 training idx: 22 training idx: 23 training idx: 24 training idx: 25 training idx: 26 training idx: 27 training idx: 28 training idx: 29 training idx: 30 training idx: 0 validation generator: epoch end validation idx: 0 training idx: 1 validation idx: 1 training idx: 2 validation idx: 2 training idx: 3 validation idx: 3 training idx: 4 validation idx: 4 training idx: 5 validation idx: 5 validation generator: epoch end training idx: 6 validation idx: 6 training idx: 7 validation idx: 7 validation idx: 0 training idx: 8 validation idx: 1 training idx: 9 validation idx: 2 training idx: 10 validation idx: 3 validation idx: 4 validation idx: 5 validation idx: 6 validation idx: 7 validation idx: 0 validation idx: 1 validation idx: 2 Epoch 00001: val_loss improved from 10512.69922 to 5905.95929, saving model to /home/stefan/vcs/MA/code/results/test/TB_dummy_distance_5906.hdf5 也会被调用!以下是控制台输出的外观:

__getitem__

fit_generator如何使用on_epoch_endmax_queue_size方法?是否还会在第一个纪元开始之前调用这些方法来获取一些重量初始化的样本数据?这种行为是由某种缓存造成的吗?

非常感谢任何帮助!提前谢谢!

更新

出于测试目的,我将fit_generator的{​​{1}}参数更改为1.这是结果终端输出:

batch_size: 12
len(train_gen): 31
len(valid_gen): 8
2018-02-14 10:10:40.001065: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
training idx: 0
training idx: 1
training idx: 2
training idx: 3
training idx: 4
training idx: 5
training idx: 6
training idx: 7
training idx: 8
training idx: 9
training idx: 10
training idx: 11
training idx: 12
training idx: 13
training idx: 14
training idx: 15
training idx: 16
training idx: 17
training idx: 18
training idx: 19
training idx: 20
training idx: 21
training idx: 22
training idx: 23
training idx: 24
training idx: 25
training idx: 26
training idx: 27
training idx: 28
training idx: 29
training idx: 30
training generator: epoch end
training idx: 0
training idx: 1
validation idx: 0
validation idx: 1
validation idx: 2
validation idx: 3
validation idx: 4
validation idx: 5
validation idx: 6
validation generator: epoch end
validation idx: 7
validation idx: 0
validation idx: 1
Epoch 00000: val_loss improved from inf to 18090.34473, saving model to /home/stefan/vcs/MA/code/results/test/TB_dummy_distance_18090.hdf5
training idx: 2
training idx: 3
training idx: 4
training idx: 5
training idx: 6
training idx: 7
training idx: 8
training idx: 9
training idx: 10
training idx: 11
training idx: 12
training idx: 13
training idx: 14
training idx: 15
training idx: 16
training idx: 17
training idx: 18
training idx: 19
training idx: 20
training idx: 21
training idx: 22
training idx: 23
training idx: 24
training idx: 25
training idx: 26
training idx: 27
training idx: 28
training idx: 29
training idx: 30
training generator: epoch end
training idx: 0
training idx: 1
validation idx: 0
validation idx: 1
validation idx: 2
validation idx: 3
validation idx: 4
validation idx: 5
validation idx: 6
validation generator: epoch end
validation idx: 7
validation idx: 0
validation idx: 1
Epoch 00001: val_loss did not improve

现在至少在第一个时代,所有训练样本都会被查询。但是对于第二纪元中的验证数据和训练数据,on_epoch_end仍然会被提前调用。

1 个答案:

答案 0 :(得分:0)

以下代码适合您

def gen(train_data):
    print('generator initiated')
    //Define a batch size
    batch_size = 64

    //Complete length of data
    data_size = len(train_data)

    //Total number of batches will be created
    num_batches = int(data_size / batch_size)


    if (num_batches*batch_size) < data_size:
         num_batches += 1
    while True:
        cnt=0
        for i in range(num_batches):
            start_index = cnt * batch_size
            end_index = min((cnt + 1) * batch_size, data_size)
            cnt +=1

            //Do some preprocessing 
            x_train_padded = add_pad(x_train,3,pad)
            x_train_padded = np.array(x_train_padded)

            yield (x_train_padded,y_train_padded)


fun_model.fit_generator(gen(train_data),steps_per_epoch =int(len(train_data)/64),nb_epoch=50,callbacks=callbacks_list, verbose=2,shuffle=True)