我写了一个cnn模块使用pytorch进行数字识别,然后尝试用gpu训练网络但是出现了以下错误。
Traceback (most recent call last):
File "main.py", line 51, in <module>
outputs = cnn(inputs)
File "/home/daniel/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/daniel/Code/kaggle-competitions/digit-recognizer/Net.py", line 40, in forward
x = self.pool(F.relu(self.conv[i](x)))
File "/home/daniel/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/daniel/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 282, in forward
self.padding, self.dilation, self.groups)
File "/home/daniel/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: Input type (CUDAFloatTensor) and weight type (CPUFloatTensor) should be the same
这是我的source code
似乎cnn.cuda()
无法正常工作,因为删除后我遇到了同样的错误。但我不知道如何解决它。
答案 0 :(得分:1)
我自己解决了。这是因为我以非标准的方式分配了子模块,子模块没有注册到我模块的子模块列表中。 module.parameter()
不会返回这些未注册子模块的参数。并且module.cuda()
仅将注册的参数移动到GPU。
默认情况下,如果您以这种方式分配子模块,子模块将自动注册:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
但是,我通过将子模块附加到列表来分配子模块:
class Cnn(nn.Module):
def __init__(self, channels, kernel_sizes, dense_layers, n_classes, img_size):
super(Cnn, self).__init__()
...
self.conv = []
self.conv.append(nn.Conv2d(1, channels[0], kernel_sizes[0]))
self.conv_img_size = math.floor((self.conv_img_size - (kernel_sizes[0]-1))/2)
for i in range(1, self.conv_layer_size):
self.conv.append(nn.Conv2d(channels[i-1], channels[i], kernel_sizes[i]))
self.conv_img_size = math.floor((self.conv_img_size - (kernel_sizes[i]-1))/2)
我需要手动调用module.add_module
来注册这些子模块。
class Cnn(nn.Module):
def __init__(self, channels, kernel_sizes, dense_layers, n_classes, img_size):
super(Cnn, self).__init__()
...
self.conv = []
self.conv.append(nn.Conv2d(1, channels[0], kernel_sizes[0]))
self.conv_img_size = math.floor((self.conv_img_size - (kernel_sizes[0]-1))/2)
self.add_module('Conv0', self.conv[0]) # Add modules manually
for i in range(1, self.conv_layer_size):
self.conv.append(nn.Conv2d(channels[i-1], channels[i], kernel_sizes[i]))
self.conv_img_size = math.floor((self.conv_img_size - (kernel_sizes[i]-1))/2)
self.add_module('Conv'+str(i), self.conv[i]) # Add modules manually
您可以通过打印模块实例来检查已注册的模块。
在添加module.add_module
之前:
>>> print(cnn)
Cnn(
(pool): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
(output_layer): Linear(in_features=1024, out_features=10, bias=True)
)
后:
>>> print(cnn)
Cnn(
(Conv0): Conv2d(1, 32, kernel_size=(5, 5), stride=(1, 1))
(Conv1): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
(pool): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
(Dense0): Linear(in_features=1024, out_features=1024, bias=True)
(output_layer): Linear(in_features=1024, out_features=10, bias=True)
)
答案 1 :(得分:1)
丹尼尔(Daniel)对自己的问题的回答似乎是正确的。问题的确是,如果将模块附加到列表中,则无法识别它们。但是,Pytorch还提供了针对此问题的内置解决方案: nn.ModuleList和nn.ModuleDict是两种容器类型,可跟踪添加的内容及其参数。 两者都具有与其Python等效项相同的功能,但是该词典使用命名的参数,并且可用于跟踪例如特定于任务的层。
一个可行的例子是:
self.conv = torch.nn.ModuleList()
self.conv.append(nn.Conv2d(1, channels[0], kernel_sizes[0]))
self.conv_img_size = math.floor((self.conv_img_size - (kernel_sizes[0]-1))/2)
for i in range(1, self.conv_layer_size):
self.conv.append(nn.Conv2d(channels[i-1], channels[i], kernel_sizes[i]))
self.conv_img_size = math.floor((self.conv_img_size - (kernel_sizes[i]-1))/2)
# Modules are automatically added to the model parameters