我正在使用可生成视频的自定义生成器。我试图用我的模型上的validate_generator做一个简单的评估,但我意识到更改batch_size会产生不同的精度结果。我决定在每个生成器步骤中打印产生的视频名称,结果是,以某种方式,生成器被调用的次数比我在valuate_generator的step
参数上指示的次数要多。
据我了解,evaluate_generator
的{{1}}参数表示要生产的批次数量。就我而言,我的生成器的step
为10,并且由于要评估的数据点为30,因此我设置了batch_size
。然后,我应该在所有可用数据点上评估我的模型,分3个步骤,每个步骤10个点。但是,生成器产生的视频要多得多,从而环回已分析的视频,从而影响最终的准确性得分。这是代码中发生的事情:
首先,我的生成器(简体):
steps=3
如您所见,这是一个非常标准的生成器。我在每一步都打印调试字符串,以查看主循环被调用了多少次。这是我的通用代码:
def video_generator(batch_size=1,files=None,shuffle=True,augment=None,load_func='hmdb_npy_rgb',preprocess_func=None,is_train=True):
L=len(files)
print('Calling video_generator. Batch size: ',batch_size,'. Number of files received:',L,'. Augmentation: ',augment)
## This line is just to make the generator infinite, keras needs that
while True:
## Define starting idx for batch slicing
batch_start = 0
batch_end = batch_size
c=1
## Loop over the txt file while there are enough unseen files for one more batch
while batch_start < L:
## LOAD DATA
limit = min(batch_end, L)
# DEBUG STRING
print('STEP',c,' - yielding', limit-batch_start,'videos.')
X = load_func(files[batch_start:limit])
Y = load_labels(files[batch_start:limit])
## PREPROCESS DATA
if preprocess_func is not None:
X = preprocess_func(X, is_train=is_train)
## AUGMENT DATA
if augment is not None:
X= augment_video_batch(X, augment)
## YIELD DATA
yield X,Y #a tuple with two numpy arrays with batch_size samples
## Increasing idxs for next batch
batch_start += batch_size
batch_end += batch_size
c+=1
基于此,我应该看到以下照片:
for k,v in classes_to_evaluate.items():
# k is a class of videos, e.g. jump, run, etc
# v is a list of video names corresponding to class k
print('Evaluating',k, '. # steps:' 3)
generator=video_generator(files = v, batch_size=batch_size, **gen_params)
metrics = model.evaluate_generator(generator, steps=3,
max_queue_size=10, workers=1, use_multiprocessing=False)
print('Time elapsed on',k,':', end-start)
但是,这就是我所看到的:
Evaluating shoot_gun . num steps: 3
Calling video_generator. Batch size: 10 . Number of files received: 30 . Augmentation: None
STEP 1 - yielding 10 videos.
STEP 2 - yielding 10 videos.
STEP 3 - yielding 10 videos.
Time elapsed on shoot_gun : # time elapsed here
对于其他类别,步数变得随机。如您所见,对于shoot_gun类,生成器将循环从 6次而不是3 进入循环,但是对于某些类,我得到了4个步骤,对于其他类,则得到了5个步骤(所有类恰好有30个视频,并使用同一生成器的新实例调用)。例如:
Evaluating shoot_gun . num steps: 3
Calling video_generator. Batch size: 10 . Number of files received: 30 . Augmentation: None
STEP 1 - yielding 10 videos.
STEP 2 - yielding 10 videos.
STEP 3 - yielding 10 videos.
STEP 1 - yielding 10 videos.
STEP 2 - yielding 10 videos.
STEP 3 - yielding 10 videos.
Time elapsed on shoot_gun : 7.721895118942484
在这里,如您所见,我得到了5个生成器步骤,而不是我想要的3个步骤。 但是然后:
Evaluating climb . num steps: 3
Calling video_generator. Batch size: 10 . Number of files received: 30 . Augmentation: None
STEP 1 - yielding 10 videos.
STEP 2 - yielding 10 videos.
STEP 3 - yielding 10 videos.
STEP 1 - yielding 10 videos.
STEP 2 - yielding 10 videos.
Time elapsed on climb : 7.923485273960978
在这里,我只有4个步骤。我不明白,因为类别之间的批量大小,视频数量或任何其他参数没有区别。重要的是要注意,在任何情况下,我都只能按照预期的行为获得3个步骤。
有人知道为什么会这样吗,以及如何使其至少保持一致吗?