Question

我是菜鸟，并且是第一次在PyTorch中创建模型。我正在尝试创建卷积自动编码器，并且在运行模型时遇到错误。我正在使用的代码是：

class MyDataset(Dataset):
    def __init__(self, image_paths, target_paths, train=True):
        self.image_paths = image_paths
        self.target_paths = target_paths

    def transform(self, image, target):
        # Transform to tensor
        resize = transforms.Resize(size=(2350,1650))
        image = resize(image)
        target = resize(target)
        grayscale = transforms.Grayscale(1)
        image = grayscale(image)
        target = grayscale(target)
        image = TF.to_tensor(image)
        target = TF.to_tensor(target)
        return image, target

    def __getitem__(self, index):
        image = Image.open(self.image_paths[index])
        target = Image.open(self.target_paths[index])
        x, y = self.transform(image, target)
        return x, y

    def __len__(self):
        return len(self.image_paths)

traindata = MyDataset(image_paths=train_data, target_paths=target_data, train=True)
testdata = MyDataset(image_paths=test_data, target_paths=None, train=False)

train_loader = DataLoader(traindata, batch_size=1, shuffle=True, num_workers=4)
test_loader = DataLoader(testdata, batch_size=1, shuffle=False, num_workers=4)

class ConvolutionalAutoEncoder(nn.Module):
    def __init__(self):
        super(ConvolutionalAutoEncoder, self).__init__()
        self.encoder_block1 = nn.Sequential(
            nn.Conv2d(1, 64, 3, stride=1, padding=1),
            nn.ReLU(True),
            nn.Conv2d(64, 64, 3, stride=1, padding=1),
            nn.ReLU(True)
        )
        self.decoder_block1 = nn.Sequential(   
            nn.ConvTranspose2d(64, 64, 3, stride=1, padding=1),
            nn.ReLU(True),
            nn.ConvTranspose2d(64, 64, 3, stride=1, padding=1),
            nn.ReLU(True)
         )
        self.decoder_block0 = nn.Sequential(  
            nn.ConvTranspose2d(64, 1, 3, stride=1, padding=1),
            nn.Sigmoid()
        )
    def forward(self, x):
        x1 = self.encoder_block1(x)
        y1 = self.decoder_block1(x1)
        y0 = self.decoder_block0(y1)
        return x

device = torch.device("cuda:2" if torch.cuda.is_available() else "cpu")
print(device)

model = ConvolutionalAutoEncoder().to(device)
# Loss and optimizer
learning_rate = 0.001
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

params = list(model.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

num_epochs = 30
total_step = len(train_loader)
for epoch in range(num_epochs):
    for batch_idx, data in enumerate(train_loader):
        inp, targ = data
        inp = inp.to(device)
        targ = targ.to(device)

        output = model(inp)
        loss = criterion(output, targ)

        model.zero_grad()
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()

        if (batch_idx+1) % 10 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

我得到的全部错误是：

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-17-28fa0c94d845> in <module>
     13 
     14         model.zero_grad()
---> 15         loss.backward()
     16         optimizer.step()
     17 

~/anaconda3/envs/gautam_new/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
     91                 products. Defaults to ``False``.
     92         """
---> 93         torch.autograd.backward(self, gradient, retain_graph, create_graph)
     94 
     95     def register_hook(self, hook):

~/anaconda3/envs/gautam_new/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     88     Variable._execution_engine.run_backward(
     89         tensors, grad_tensors, retain_graph, create_graph,
---> 90         allow_unreachable=True)  # allow_unreachable flag
     91 
     92 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

请帮助。另外，如果可能的话，还提供有关如何使我的模型更深入的建议。我不断收到CUDA内存不足错误。

谢谢。

Answer 1

我无法测试您的模型，但是考虑到错误消息，您的问题的原因才是您的forward的返回值。

当前您将返回x，它是您的实际输入而不是输出：

def forward(self, x):
    x1 = self.encoder_block1(x)
    y1 = self.decoder_block1(x1)
    y0 = self.decoder_block0(y1)
    return x

因此，要返回输出，您可能需要将返回值形式从x更改为y0：

def forward(self, x):
    x1 = self.encoder_block1(x)
    y1 = self.decoder_block1(x1)
    y0 = self.decoder_block0(y1)
    return y0

关于内存：

请不要在一个问题中放入太多问题。假设您在一个问题中有三个完全不同的问题，并且有三个人都可以解决您的一个问题，那么您最终可能会无人回答。
因为它们都不能够为您提供解决所有这些问题的完整答案。
但是，如果将您的问题分为三个问题，则可能只能得到三个答案，可以解决所有问题。在许多情况下，它也可以改善问题，因为可以在不编写整本小说的情况下更具体地解决问题。
当然，如果您的问题非常相关，则可以将它们放在一个问题中，但这

我想您的forward函数仍有一些副作用会导致内存问题（很奇怪-不确定）对这个）。因此，如果幸运的话，它也可以解决您的内存问题，但是如果没有，您绝对应该对此提出一个新问题。

运行卷积自动编码器RuntimeError时出错：张量的元素0不需要grad且没有grad_fn

1 个答案: