初始化后PyTorch就会耗尽GPU内存

时间:2018-10-31 17:14:46

标签: python out-of-memory gpu cpu pytorch

我在PyTorch中建立了模型。 代码在CPU上运行良好。但是,似乎在初始化网络并将其切换到cuda之后, GPU内存已用完

失败的代码是:

model = init_from_scratch(args, train_exs, dev_exs)
model.init_optimizer()

nParams= sum([np.prod(list(p.size())) for p in model.network.parameters()])
print('* total number of parameters:',nParams)

device = torch.device("cuda:"+str(args.gpu) if args.cuda else "cpu")
model.set_device(device) #call last!!

# DATA ITERATORS  (not run to here)
train_dataset = data.ReaderDataset()
train_sampler = torch.utils.data.sampler.RandomSampler()
train_loader = torch.utils.data.DataLoader()
...


# TRAIN (not run to here)
for epoch in range(start_epoch, args.num_epochs):
....

def set_device(self,device):
    print("device",device)
    self.use_cuda = False if str(device) == "cpu" else True
    self.network = self.network.to(device)
  • 参数总数:30708906

  • 火炬版本为'0.4.0'

  • 具有12G GPU内存和nvidia-smi的TITAN Xp免费显示

那么,我的代码有什么问题?

错误输出为:

Traceback (most recent call last):
File "script/train2.0.py", line 683, in <module>
  main(args)
File "script/train2.0.py", line 542, in main
  model.set_device(device)
File "./model.py", line 588, in set_device
  self.network = self.network.to(device)
File "/home/username/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 393, in to
  return self._apply(lambda t: t.to(device))
File "/home/username/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 176, in _apply
  module._apply(fn)
File "/home/username/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 182, in _apply
  param.data = fn(param.data)
File "/home/username/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 393, in <lambda>
  return self._apply(lambda t: t.to(device))
File "/home/username/.local/lib/python3.5/site-packages/torch/cuda/__init__.py", line 161, in _lazy_init
  torch._C._cuda_init()

RuntimeError:cuda运行时错误(2):/pytorch/aten/src/THC/THCTensorRandom.cu:25内存不足

0 个答案:

没有答案