Pytorch,“向后” RuntimeError:尝试第二次向后浏览图形,但缓冲区已被释放

时间:2018-07-15 14:18:38

标签: neural-network pytorch backpropagation reinforcement-learning loss

我正在用PyTorch(0.4)实现DDPG,并且陷入了亏损的困境。 所以,首先我的代码执行更新:

def update_nets(self, transitions):
    """
    Performs one update step
    :param transitions: list of sampled transitions
    """
    # get batches
    batch = transition(*zip(*transitions))
    states = torch.stack(batch.state)
    actions = torch.stack(batch.action)
    next_states = torch.stack(batch.next_state)
    rewards = torch.stack(batch.reward)

    # zero gradients
    self._critic.zero_grad()

    # compute critic's loss
    y = rewards.view(-1, 1) + self._gamma * \
        self.critic_target(next_states, self.actor_target(next_states))

    loss_critic = F.mse_loss(y, self._critic(states, actions),
                             size_average=True)

    # backpropagte it
    loss_critic.backward()
    self._optim_critic.step()

    # zero gradients
    self._actor.zero_grad()

    # compute actor's loss
    loss_actor = ((-1.) * self._critic(states, self._actor(states))).mean()

    # backpropagate it
    loss_actor.backward()
    self._optim_actor.step()

    # do soft updates
    self.perform_soft_update(self.actor_target, self._actor)
    self.perform_soft_update(self.critic_target, self._critic)

其中self._actorself._crticself.actor_targetself.critic_target是Nets。

如果运行此命令,则在第二次迭代中会出现以下错误:

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

line 221, in update_nets
    loss_critic.backward()
line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
line 89, in backward
    allow_unreachable=True)  # allow_unreachable flag

我不知道是什么原因造成的。

到目前为止,我所知道的是loss_critic.backward()调用会导致错误。 我已经调试了loss_critic-它具有有效值。 如果我将损失计算替换为简单的

loss_critic = torch.tensor(1., device=self._device, dtype=torch.float, requires_grad=True)

张量包含值1可以正常工作。 另外,我已经检查过我没有保存一些可能导致错误的结果。 另外,使用loss_actor更新actor不会造成任何问题。

有人知道这是怎么回事吗?

谢谢!

更新

我替换了

# zero gradients
    self._critic.zero_grad()

# zero gradients
    self._actor.zero_grad()

使用

# zero gradients
self._critic.zero_grad()
self._actor.zero_grad()
self.critic_target.zero_grad()
self.actor_target.zero_grad()

(两个都调用),但仍然失败,并显示相同的错误。 此外,在一次迭代结束时执行更新的代码

def perform_soft_update(self, target, trained):
    """
    Preforms the soft update
    :param target: Net to be updated
    :param trained: Trained net - used for update
    """
    for param_target, param_trained in \
            zip(target.parameters(), trained.parameters()):
        param_target.data.copy_(
            param_target.data * (
                    1.0 - self._tau) + param_trained * self._tau
        )

2 个答案:

答案 0 :(得分:2)

我找到了解决方法。 我出于训练目的在我的replay_buffer中保存了张量,在每次迭代中都使用了

# get batches
    batch = transition(*zip(*transitions))
    states = torch.stack(batch.state)
    actions = torch.stack(batch.action)
    next_states = torch.stack(batch.next_state)
    rewards = torch.stack(batch.reward)

代码被截断。 张量的这种“节省”是问题的原因。因此,我更改了代码以仅保存数据(tensor.data.numpy().tolist()),并仅在需要时才将其放入张量。

更详细: 在DDPG中,我每次迭代都评估策略,并批量执行一个学习步骤。 现在,我通过以下方式将评估保存在重播缓冲区中:

action = self.action(state)
...
self.replay_buffer.push(state.data.numpy().tolist(), action.data.numpy().tolist(), ...)

并像这样使用它

batch = transition(*zip(*transitions))
states = self.to_tensor(batch.state)
actions = self.to_tensor(batch.action)
next_states = self.to_tensor(batch.next_state)
rewards = self.to_tensor(batch.reward)

答案 1 :(得分:0)

Didn't call zero_grad() on self.actor_target and self.critic_target? Or is it called in self.perform_soft_update()?