我正在尝试使用 SeqGAN 的策略梯度来实现损失。如果我们在函数中执行此操作,我知道参数将是 input
、target
和 rewards
。这些是参数的维度:
def batchPGLoss(self, inp: torch.Tensor, target: torch.Tensor, reward: List) -> torch.Tensor:
batch_size, seq_len = inp.size()
# swap the dimensions
inp = inp.permute(1, 0) # (seq_len x batch_size)
# swap the dimensions
target = target.permute(1, 0) # (seq_len x batch_size)
# init hidden state
h = self.init_hidden(batch_size)
loss = torch.zeros(1)
for i in range(seq_len):
# pass the first tokens for each elem in batch
out, h = self.forward(inp[i], h)
for j in range(batch_size):
loss += -out[j][target.data[i][j]] * reward[j] # log(P(y_t | Y_1, ..., Y_{t-1})) * Q
return loss / batch_size
运行时出现以下问题
RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
我检查了哪个部分给出了错误,我发现 -out[j][target.data[i][j]]
是问题所在,因为它需要渐变。我可以使用 tensor.detach().numpy()
但这不能解决问题,因为我仍然需要在图表上保留 out
有什么建议吗?