当我测试图像网络模型的计算时间时,我发现如果张量像损失或top1,top5不会被函数tensor.item()
或tensor.cpu().data.numpy()
转换,{{1}的计算时间},如下所示。
optimizer.step()
在第一种情况下,total_top1 = torch.tensor(0.).cuda() # The first case
total_top5 = torch.tensor(0.).cuda() # The first case
total_loss = torch.tensor(0.).cuda() # The first case
# total_top1 = 0.0 # The second case
# total_top5 = 0.0 # The second case
# total_loss = 0.0 # The second case
outputs, loss = model(images, path, labels, criterion, gpu_nums)
st = time.time()
optimizer.step()
optimizer.zero_grad()
total_to += time.time() - st # step() time
top1, top5 = accuracy(outputs, labels, topk=(1, 5))
st = time.time()
total_counter += images.size(0)
# total_top1 += top1.cpu().data.numpy() # or top1.item() # The second case
# total_top5 += top5.cpu().data.numpy() # The second case
# total_loss += loss.cpu().data.numpy() # The second case
total_top1 += top1 # The first case
total_top5 += top5 # The first case
total_loss += loss # The first case
total_ti += time.time() - st # data transform time
= 18.0s,total_to
= 0.2s,可进行100次迭代计算。但是在第二种情况下,total_ti
= 8s,total_to
= 9s。
我试图找出造成这种差异的原因。最合理的答案是total_ti
阻止了新计算的变量进入计算图,因此item()
的时间减少了。但是,当我运行以下代码时:
step()
outputs, loss = model(images, path, labels, criterion, gpu_nums)
st = time.time()
optimizer.step()
optimizer.zero_grad()
total_to += time.time() - st # step() time
st = time.time()
total_counter += images.size(0)
# loss.item()
total_ti += time.time() - st # data transform time
仍然约为18s,并且total_to
= 0s。使用loss.item()时,total_ti
变为8s,而total_to
变为9s。 / p>
那么您能告诉我pytorch中的原因是什么,如何使total_ti
为0和total_ti
为8s?非常感谢您的回答。