以下是我在训练RNN时遇到的错误:
RuntimeError:尝试第二次向后浏览图形,但缓冲区已被释放。第一次回叫时,请指定keep_graph = True。
我尝试了给定here的解决方案
我不明白为什么会这样。进行.backward(retain_graph=True)
可解决该错误,但是无论有多少个历时,损失都不会减少。
可能是因为我不完全理解计算图中的“分离部分”。
RNN配置:
batch_size = 1
input_size = 1
sequence_length = 10
hidden_size = 1
num_layer = 10
RNN类别:
class ModelRnn(nn.Module):
def __init__(self):
super(ModelRnn, self).__init__()
self.rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size, batch_first=True, num_layers=num_layer)
def forward(self, x, hidden):
x = x.view(batch_size, sequence_length, input_size)
out, hidden = self.rnn(x, hidden)
return hidden, out
def init_hidden(self):
hidden_state = Variable(torch.zeros(num_layer, batch_size, hidden_size))
return (hidden_state)
火车循环:
hidden = model.init_hidden()
for epoch in range(5):
for x_batch, y_batch in train_loader:
model.zero_grad()
hidden, output = model(x_batch, hidden)
optimizer.zero_grad()
loss = criterion(output, y_batch)
print(f"{epoch+1} epoch | loss = {loss}")
loss.backward()
optimizer.step()
我收到以下错误:
1 epoch | loss = 96040000.0
2 epoch | loss = 96040000.0
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-90-8a1b1c3e9bec> in <module>()
9
10 print(f"{epoch+1} epoch | loss = {loss}")
---> 11 loss.backward()
12 optimizer.step()
~/anaconda3/envs/arena-py3.6/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
100 products. Defaults to ``False``.
101 """
--> 102 torch.autograd.backward(self, gradient, retain_graph, create_graph)
103
104 def register_hook(self, hook):
~/anaconda3/envs/arena-py3.6/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
88 Variable._execution_engine.run_backward(
89 tensors, grad_tensors, retain_graph, create_graph,
---> 90 allow_unreachable=True) # allow_unreachable flag
91
92
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
该模型应该毫无障碍地进行训练,但是我在这里遗漏了一些我无法弄清的东西。