Question

我正在使用Pytorch和来自变压器库的预训练模型。但是，在进行微调时，它很快就会耗尽GPU内存，我不知道为什么。

我发现前向传递中存在内存泄漏。

show_memory('before call')
outputs = model(inputs, masked_lm_labels=labels) if args.mlm else model(inputs, labels=labels)
show_memory('after call')

del outputs
gc.collect()
model.zero_grad()
torch.cuda.empty_cache()
show_memory('after clearing')

此处show_memory函数的定义如下：

def show_memory(text=''):
  t = torch.cuda.get_device_properties(0).total_memory
  c = torch.cuda.memory_cached(0)
  a = torch.cuda.memory_allocated(0)
  f = c-a  # free inside cache
  print(f'\n\n{text}\nTotal: {t}\nCached: {c} \nAllocated: {a} \nFree in cache: {f}\n\n')

输出为

before call 
Total: 17071734784  
Cached: 3441426432  
Allocated: 3324039680  
Free in cache: 117386752


after call 
Total: 17071734784 
Cached: 6830424064  
Allocated: 6720267264  
Free in cache: 110156800


after clearing 
Total: 17071734784 
Cached: 6109003776  
Allocated: 4876882944  
Free in cache: 1232120832

因此，尽管我清除了所有内容，但内存中的数据仍然比原始数据多了近2 GB。我不知道为什么会这样，是否有办法详细检查这种情况。

Pytorch变压器内存泄漏推断

0 个答案: