在Mozilla DeepSpeech中,“时代测试[数字]”是什么意思?
在下面的示例中,它说Test of Epoch 77263
,尽管根据我的理解,应该只有一个纪元,因为我将--display_step 1 --limit_train 1 --limit_dev 1 --limit_test 1 --early_stop False --epoch 1
作为参数:
dernoncourt@ilcomp:~/asr/DeepSpeech$ ./DeepSpeech.py --train_files data/common-voice-v1/cv-valid-train.csv,data/common-voice-v1/cv-other-train.csv --dev_files data/common-voice-v1/cv-valid-dev.csv --test_files data/common-voice-v1/cv-valid-test.csv --decoder_library_path /asr/DeepSpeech/libctc_decoder_with_kenlm.so --fulltrace True --display_step 1 --limit_train 1 --limit_dev 1 --limit_test 1 --early_stop False --epoch 1
W Parameter --validation_step needs to be >0 for early stopping to work
I Test of Epoch 77263 - WER: 1.000000, loss: 60.50202560424805, mean edit distance: 0.894737
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 58.900837, mean edit distance: 0.894737
I - src: "how do you like her"
I - res: "i "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 60.517113, mean edit distance: 0.894737
I - src: "how do you like her"
I - res: "i "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 60.668221, mean edit distance: 0.894737
I - src: "how do you like her"
I - res: "i "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 61.921925, mean edit distance: 0.894737
I - src: "how do you like her"
I - res: "i "
I --------------------------------------------------------------------------------
答案 0 :(得分:1)
这实际上不是错误,因为当前纪元是根据基础计算的 当前参数和快照持久化全局 步数。请仔细阅读以下摘录:
# Number of GPUs per worker - fixed for now by local reality or cluster setup gpus_per_worker = len(available_devices) # Number of batches processed per job per worker batches_per_job = gpus_per_worker * max(1, FLAGS.iters_per_worker) # Number of batches per global step batches_per_step = gpus_per_worker * max(1, FLAGS.replicas_to_agg) # Number of global steps per epoch - to be at least 1 steps_per_epoch = max(1, model_feeder.train.total_batches // batches_per_step) # The start epoch of our training # Number of GPUs per worker - fixed for now by local reality or cluster setup gpus_per_worker = len(available_devices) # Number of batches processed per job per worker batches_per_job = gpus_per_worker * max(1, FLAGS.iters_per_worker) # Number of batches per global step batches_per_step = gpus_per_worker * max(1, FLAGS.replicas_to_agg) # Number of global steps per epoch - to be at least 1 steps_per_epoch = max(1, model_feeder.train.total_batches // batches_per_step) # The start epoch of our training self._epoch = step // steps_per_epoch
因此,发生的情况是您在训练期间的定尺尺寸与 您当前设置的尺寸。因此是奇怪的纪元编号。
简化示例(批量大小不混淆):如果您曾经接受过培训 1000个样本训练集中的5个时期,您获得了5000个“全局步骤” (在快照中保留为数字)。经过培训后,您 将命令行参数更改为一组大小1(您的--limit_ * 参数)。 “突然”您将显示纪元5000,因为5000 全局步骤意味着应用大小为1 5000次的数据集。
简单点:使用--checkpoint_dir
参数可以避免此类问题。