我在PyTorch中建立了模型。 代码在CPU上运行良好。但是,似乎在初始化网络并将其切换到cuda之后, GPU内存已用完。
失败的代码是:
model = init_from_scratch(args, train_exs, dev_exs)
model.init_optimizer()
nParams= sum([np.prod(list(p.size())) for p in model.network.parameters()])
print('* total number of parameters:',nParams)
device = torch.device("cuda:"+str(args.gpu) if args.cuda else "cpu")
model.set_device(device) #call last!!
# DATA ITERATORS (not run to here)
train_dataset = data.ReaderDataset()
train_sampler = torch.utils.data.sampler.RandomSampler()
train_loader = torch.utils.data.DataLoader()
...
# TRAIN (not run to here)
for epoch in range(start_epoch, args.num_epochs):
....
def set_device(self,device):
print("device",device)
self.use_cuda = False if str(device) == "cpu" else True
self.network = self.network.to(device)
参数总数:30708906
火炬版本为'0.4.0'
具有12G GPU内存和nvidia-smi的TITAN Xp免费显示
那么,我的代码有什么问题?
错误输出为:
Traceback (most recent call last):
File "script/train2.0.py", line 683, in <module>
main(args)
File "script/train2.0.py", line 542, in main
model.set_device(device)
File "./model.py", line 588, in set_device
self.network = self.network.to(device)
File "/home/username/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 393, in to
return self._apply(lambda t: t.to(device))
File "/home/username/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 176, in _apply
module._apply(fn)
File "/home/username/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 182, in _apply
param.data = fn(param.data)
File "/home/username/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 393, in <lambda>
return self._apply(lambda t: t.to(device))
File "/home/username/.local/lib/python3.5/site-packages/torch/cuda/__init__.py", line 161, in _lazy_init
torch._C._cuda_init()
RuntimeError:cuda运行时错误(2):/pytorch/aten/src/THC/THCTensorRandom.cu:25内存不足