我正在尝试通过使用以下代码(最新的pytorch版本)在多个GPU上进行培训:
from torchvision import models
model = model.vgg16(pretrained=True)
model.classifier._modules['6'] = torch.nn.Linear(4096, 10)
self.model = torch.nn.DataParallel(model, device_ids=[0,1,2]).cuda()
self.model = model.to(f'cuda:0')
...
def forward(self, input_data):
output = self.model.forward(input_data)
调用 self.model.forward(input_data)时出现此错误:
File "/home/poahmadvand/py3env/lib/python3.7/site-packages/torchvision/models/vgg.py", line 43, in forward
x = self.features(x)
File "/home/poahmadvand/py3env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/poahmadvand/py3env/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/poahmadvand/py3env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/poahmadvand/py3env/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "/home/poahmadvand/py3env/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 1 (while checking arguments for cudnn_convolution)
如何解决此错误?谢谢。