Question

我正在尝试使用 SeqGAN 的策略梯度来实现损失。如果我们在函数中执行此操作，我知道参数将是 input、target 和 rewards。这些是参数的维度：

输入：（批量大小 x 序列长度）
目标：（批量大小 x 序列长度）
奖励：（1 x 批量大小）

   def batchPGLoss(self, inp: torch.Tensor, target: torch.Tensor, reward: List) -> torch.Tensor:
        batch_size, seq_len = inp.size()
        # swap the dimensions
        inp = inp.permute(1, 0)  # (seq_len x batch_size)
        # swap the dimensions
        target = target.permute(1, 0)  # (seq_len x batch_size)
        # init hidden state
        h = self.init_hidden(batch_size)

        loss = torch.zeros(1)
        for i in range(seq_len):
            # pass the first tokens for each elem in batch
            out, h = self.forward(inp[i], h)
            for j in range(batch_size):
                loss += -out[j][target.data[i][j]] * reward[j] # log(P(y_t | Y_1, ..., Y_{t-1})) * Q      
        return loss / batch_size

运行时出现以下问题

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

我检查了哪个部分给出了错误，我发现 -out[j][target.data[i][j]] 是问题所在，因为它需要渐变。我可以使用 tensor.detach().numpy() 但这不能解决问题，因为我仍然需要在图表上保留 out

有什么建议吗？

计算策略梯度的损失

0 个答案: