在PyTorch中使用RNN训练时间序列模型时出现运行时错误(反向传播)

时间:2019-05-08 14:12:55

标签: deep-learning lstm pytorch recurrent-neural-network backpropagation

以下是我在训练RNN时遇到的错误:

  

RuntimeError:尝试第二次向后浏览图形,但缓冲区已被释放。第一次回叫时,请指定keep_graph = True。

我尝试了给定here的解决方案

我不明白为什么会这样。进行.backward(retain_graph=True)可解决该错误,但是无论有多少个历时,损失都不会减少。

可能是因为我不完全理解计算图中的“分离部分”。

RNN配置:

batch_size = 1
input_size = 1
sequence_length = 10
hidden_size = 1
num_layer = 10

RNN类别:

class ModelRnn(nn.Module):
    def __init__(self):
        super(ModelRnn, self).__init__()

        self.rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size, batch_first=True, num_layers=num_layer)

    def forward(self, x, hidden):
        x = x.view(batch_size, sequence_length, input_size)

        out, hidden = self.rnn(x, hidden)

        return hidden, out

    def init_hidden(self):
        hidden_state = Variable(torch.zeros(num_layer, batch_size, hidden_size))

        return (hidden_state)

火车循环:

hidden = model.init_hidden()    

for epoch in range(5):
    for x_batch, y_batch in train_loader:
        model.zero_grad()
        hidden, output = model(x_batch, hidden)
        optimizer.zero_grad()
        loss = criterion(output, y_batch)

    print(f"{epoch+1} epoch | loss = {loss}")
    loss.backward()
    optimizer.step()

我收到以下错误:

1 epoch | loss = 96040000.0
2 epoch | loss = 96040000.0

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-90-8a1b1c3e9bec> in <module>()
      9 
     10     print(f"{epoch+1} epoch | loss = {loss}")
---> 11     loss.backward()
     12     optimizer.step()

~/anaconda3/envs/arena-py3.6/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    100                 products. Defaults to ``False``.
    101         """
--> 102         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    103 
    104     def register_hook(self, hook):

~/anaconda3/envs/arena-py3.6/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     88     Variable._execution_engine.run_backward(
     89         tensors, grad_tensors, retain_graph, create_graph,
---> 90         allow_unreachable=True)  # allow_unreachable flag
     91 
     92 

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

该模型应该毫无障碍地进行训练,但是我在这里遗漏了一些我无法弄清的东西。

0 个答案:

没有答案