pytorch - 使用 pytorch 为 dqn 选择动作

我是 DQN 的新手，并试图了解它的编码。我正在尝试将下面的代码作为 epsilon 贪婪动作选择，但我不确定它是如何工作的

 
    if sample > eps_threshold:
        with torch.no_grad():
           # t.max(1) will return largest      column value of each row.
            # second column on max result is index of where max element was
            # found, so we pick action with the larger expected reward.
            return policy_net(state).max(1)[1].view(1, 1)
    else:
        return   torch.tensor([[random.randrange(n_actions)]], device=device, dtype=torch.long)

你能告诉我什么是 max(1)[1] 中的索引，什么是 view(1, 1) 及其索引。还有为什么“with torch.no_grad():”被使用了

使用 pytorch 为 dqn 选择动作

0 个答案: