从另一个张量创建特定张量

时间:2020-03-28 23:48:39

标签: python pytorch q-learning

q_pred = self.Q.forward(states)提供了以下输出:

tensor([[-4.4713e-02,  4.2878e-03],
        [-2.2801e-01,  2.2295e-01],
        [-9.8098e-03, -1.0766e-01],
        [-1.4654e-01,  1.2742e-01],
        [-1.6224e-01,  1.6565e-01],
        [-3.6515e-02,  3.1022e-02],
        [-4.5094e-02,  1.4848e-02],
        [-1.4157e-01,  1.3974e-01],
        [-3.0593e-03, -4.2342e-02],
        [-4.1689e-02,  2.9376e-02],
        [-9.3629e-02,  1.0297e-01],
        [-5.2163e-04, -7.4799e-02],
        [-2.8944e-02, -1.2417e-01]], grad_fn=<AddmmBackward>)

actions给我以下输出

tensor([1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1], dtype=torch.int32)

从以上两个输出中,我如何能得到以下结果:

tensor([4.2878e-03, -2.2801e-01, -1.0766e-01, 1.2742e-01, -1.6224e-01,  3.1022e-02, 1.4848e-02,
        -1.4157e-01, -4.2342e-02, -4.1689e-02, -9.3629e-02, -7.4799e-02,, -1.2417e-01], grad_fn=<AddmmBackward>)

更新

self.Q的实现方法如下:

class LinearDeepQNetwork(nn.Module):
    def __init__(self, lr, n_actions, input_dims):
        super(LinearDeepQNetwork, self).__init__()

        self.fc1 = nn.Linear(*input_dims, 128)
        self.fc2 = nn.Linear(128, n_actions)

        self.optimizer = optim.Adam(self.parameters(), lr=lr)
        self.loss  = nn.MSELoss()
        self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')
        self.to(self.device)

    def forward(self, state):
        layer1 = F.relu(self.fc1(state))
        actions = self.fc2(layer1)

        return actions

1 个答案:

答案 0 :(得分:0)

您可以使用:

q_pred[torch.arange(q_pred.size(0)), actions.type(torch.LongTensor)]