我正在运行一个有关视觉语音识别任务的深度学习项目,但发现了一个奇怪的现象。 在前几个时期中,损耗可以正常速度减小。我的意思是,在某个时代,损失会随着迭代次数的增加而减少。在随后的时期中,损失在整个时期几乎没有变化,但是在下一个时期开始时会减少。
有时我会在一个划时代的中断后中断正在运行的代码,并从训练有素的砝码重新开始。损失也将在下一个纪元开始时减少。
这是培训代码:
for epoch in range(283,args.epochs):
model.train()
running_loss, running_corrects, running_all, cer = 0., 0., 0., 0.
for batch_idx, sample_batched in enumerate(dset_loaders['train']):
optimizer.zero_grad()
inputs,targets,lengths,y_lengths,idx = sample_batched
inputs = inputs.float()
inputs, targets = inputs.to(device) , targets.to(device)
outputs = model(inputs)
loss = criterion(F.log_softmax(outputs,dim=-1),targets,lengths,y_lengths)
loss.backward()
optimizer.step()
decoded = decoder.decode_greedy(outputs,lengths)
cursor, gt = 0, []
for b in range(inputs.size(0)):
y_str = ''.join([vocabularies[ch] for ch in targets[cursor: cursor + y_lengths[b]]])
gt.append(y_str)
cursor += y_lengths[b]
CER = decoder.cer_batch(decoded,gt)
cer += CER
cer_mean = cer/(batch_idx+1)
running_loss += loss.data * inputs.size(0)
running_all += len(inputs)
if batch_idx == 0:
since = time.time()
else (batch_idx+1) % args.interval == 0 or (batch_idx == len(dset_loaders['train'])-1):
print('Process: [{:5.0f}/{:5.0f} ({:.0f}%)]\tLoss: {:.4f}\tcer:{:.4f}\tCost time:{:5.0f}s\tEstimated time:{:5.0f}s\t'.format(
running_all,
len(dset_loaders['train'].dataset),
100. * batch_idx / (len(dset_loaders['train'])-1),
running_loss / running_all,
cer_mean,
time.time()-since,
(time.time()-since)*(len(dset_loaders['train'])-1) / batch_idx - (time.time()-since)))
print('{} Epoch:\t{:2}\tLoss: {:.4f}\tcer:{:.4f}\t'.format(
'pretrain',
epoch,
running_loss / len(dset_loaders['train'].dataset),
cer_mean)+'\n')
torch.save(model.state_dict(), save_path+'/'+args.mode+'_'+str(epoch+1)+'.pt')
我对这种现象感到非常困惑。我认为,如果整个时期的损失都没有改变,那么下一个时期的损失也不应该改变。为什么在整个时期都没有变化之后,损失在下一时期的开始仍然改变? 有人可以帮我解决这个问题吗?谢谢!
答案 0 :(得分:0)
我认为这可能与您如何记录损失有关。
您有一个running_loss
表示此时期计算的每个数据点的总损失,而running_all
降级了此时期计算的数据点的总数。您打印running_loss / running_all
,这是此时期中每个数据点的平均损耗。
随着收集到更多的数据点,即使损失稳定地减少,新损失也要与大量先前计算的损失进行平均,这使得减少似乎较慢。在这里解释:https://gist.github.com/valkjsaaa/b0b26075174a87b3fd302b4b52ab035a
我建议将running_loss / running_all
替换为loss.data / len(inputs)
,这是当前批次的损失,看看是否有帮助。
更改后的代码应如下所示:
for epoch in range(283,args.epochs):
model.train()
running_loss, running_corrects, running_all, cer = 0., 0., 0., 0.
for batch_idx, sample_batched in enumerate(dset_loaders['train']):
optimizer.zero_grad()
inputs,targets,lengths,y_lengths,idx = sample_batched
inputs = inputs.float()
inputs, targets = inputs.to(device) , targets.to(device)
outputs = model(inputs)
loss = criterion(F.log_softmax(outputs,dim=-1),targets,lengths,y_lengths)
loss.backward()
optimizer.step()
decoded = decoder.decode_greedy(outputs,lengths)
cursor, gt = 0, []
for b in range(inputs.size(0)):
y_str = ''.join([vocabularies[ch] for ch in targets[cursor: cursor + y_lengths[b]]])
gt.append(y_str)
cursor += y_lengths[b]
CER = decoder.cer_batch(decoded,gt)
cer += CER
cer_mean = cer/(batch_idx+1)
running_loss += loss.data * inputs.size(0)
running_all += len(inputs)
if batch_idx == 0:
since = time.time()
else (batch_idx+1) % args.interval == 0 or (batch_idx == len(dset_loaders['train'])-1):
print('Process: [{:5.0f}/{:5.0f} ({:.0f}%)]\tLoss: {:.4f}\tcer:{:.4f}\tCost time:{:5.0f}s\tEstimated time:{:5.0f}s\t'.format(
running_all,
len(dset_loaders['train'].dataset),
100. * batch_idx / (len(dset_loaders['train'])-1),
loss.data,
cer_mean,
time.time()-since,
(time.time()-since)*(len(dset_loaders['train'])-1) / batch_idx - (time.time()-since)))
print('{} Epoch:\t{:2}\tLoss: {:.4f}\tcer:{:.4f}\t'.format(
'pretrain',
epoch,
running_loss / len(dset_loaders['train'].dataset),
cer_mean)+'\n')
torch.save(model.state_dict(), save_path+'/'+args.mode+'_'+str(epoch+1)+'.pt')