Kerasvaluate_generator计算的步数比“ steps”参数中指示的数目更多

时间:2018-06-25 19:31:05

标签: python keras deep-learning generator

我正在使用可生成视频的自定义生成器。我试图用我的模型上的validate_generator做一个简单的评估,但我意识到更改batch_size会产生不同的精度结果。我决定在每个生成器步骤中打印产生的视频名称,结果是,以某种方式,生成器被调用的次数比我在valuate_generator的step参数上指示的次数要多。

据我了解,evaluate_generator的{​​{1}}参数表示要生产的批次数量。就我而言,我的生成器的step为10,并且由于要评估的数据点为30,因此我设置了batch_size。然后,我应该在所有可用数据点上评估我的模型,分3个步骤,每个步骤10个点。但是,生成器产生的视频要多得多,从而环回已分析的视频,从而影响最终的准确性得分。这是代码中发生的事情:

首先,我的生成器(简体):

steps=3

如您所见,这是一个非常标准的生成器。我在每一步都打印调试字符串,以查看主循环被调用了多少次。这是我的通用代码:

def video_generator(batch_size=1,files=None,shuffle=True,augment=None,load_func='hmdb_npy_rgb',preprocess_func=None,is_train=True):


L=len(files)

print('Calling video_generator. Batch size: ',batch_size,'. Number of files received:',L,'. Augmentation: ',augment)


## This line is just to make the generator infinite, keras needs that
while True:

    ## Define starting idx for batch slicing
    batch_start = 0
    batch_end = batch_size
    c=1

    ## Loop over the txt file while there are enough unseen files for one more batch
    while batch_start < L:

        ## LOAD DATA
        limit = min(batch_end, L)

        # DEBUG STRING
        print('STEP',c,' - yielding', limit-batch_start,'videos.')

        X = load_func(files[batch_start:limit])
        Y = load_labels(files[batch_start:limit])

        ## PREPROCESS DATA
        if preprocess_func is not None:
            X = preprocess_func(X, is_train=is_train)

        ## AUGMENT DATA
        if augment is not None:
            X= augment_video_batch(X, augment)

        ## YIELD DATA
        yield X,Y #a tuple with two numpy arrays with batch_size samples

        ## Increasing idxs for next batch
        batch_start += batch_size
        batch_end += batch_size
        c+=1

基于此,我应该看到以下照片:

for k,v in classes_to_evaluate.items():
  # k is a class of videos, e.g. jump, run, etc
  # v is a list of video names corresponding to class k

  print('Evaluating',k, '. # steps:' 3)

  generator=video_generator(files = v, batch_size=batch_size, **gen_params)

  metrics = model.evaluate_generator(generator, steps=3,
            max_queue_size=10, workers=1, use_multiprocessing=False)

  print('Time elapsed on',k,':', end-start)

但是,这就是我所看到的:

Evaluating shoot_gun . num steps: 3
Calling video_generator. Batch size:  10 . Number of files received:  30 . Augmentation:  None
STEP 1  - yielding 10  videos.
STEP 2  - yielding 10  videos.
STEP 3  - yielding 10  videos.
Time elapsed on shoot_gun : # time elapsed here

对于其他类别,步数变得随机。如您所见,对于shoot_gun类,生成器将循环从 6次而不是3 进入循环,但是对于某些类,我得到了4个步骤,对于其他类,则得到了5个步骤(所有类恰好有30个视频,并使用同一生成器的新实例调用)。例如:

Evaluating shoot_gun . num steps: 3
Calling video_generator. Batch size:  10 . Number of files received:  30 . Augmentation:  None
STEP 1  - yielding 10  videos.
STEP 2  - yielding 10  videos.
STEP 3  - yielding 10  videos.
STEP 1  - yielding 10  videos.
STEP 2  - yielding 10  videos.
STEP 3  - yielding 10  videos.
Time elapsed on shoot_gun : 7.721895118942484

在这里,如您所见,我得到了5个生成器步骤,而不是我想要的3个步骤。 但是然后:

Evaluating climb . num steps: 3
Calling video_generator. Batch size:  10 . Number of files received:  30 . Augmentation:  None
STEP 1  - yielding 10  videos.
STEP 2  - yielding 10  videos.
STEP 3  - yielding 10  videos.
STEP 1  - yielding 10  videos.
STEP 2  - yielding 10  videos.
Time elapsed on climb : 7.923485273960978

在这里,我只有4个步骤。我不明白,因为类别之间的批量大小,视频数量或任何其他参数没有区别。重要的是要注意,在任何情况下,我都只能按照预期的行为获得3个步骤。

有人知道为什么会这样吗,以及如何使其至少保持一致吗?

0 个答案:

没有答案