Question

作为GRU训练的一部分，我想检索隐藏状态张量。

我定义了一个具有两层的GRU：

self.lstm = nn.GRU(params.vid_embedding_dim, params.hidden_dim , 2)

forward函数的定义如下（以下只是实现的一部分）：

    def forward(self, s, order, batch_size, where, anchor_is_phrase = False):
    """
    Forward prop. 
    """
      # s is of shape [128 , 1 , 300] , 128 is batch size
      output, (a,b) = self.lstm(s.cuda())
      output.data.contiguous()

形状是[128，400]（128是每个样本嵌入400维向量中的样本数）。

我知道out是最后一个隐藏状态的输出，因此我希望它等于b。但是，在检查值之后，我发现它确实相等，但是b包含张量的顺序不同，例如，output[0]是b[49]。我在这里想念什么吗？

谢谢。

Answer 1

我了解您的困惑。看一下下面的示例和注释：

# [Batch size, Sequence length, Embedding size]
inputs = torch.rand(128, 5, 300)
gru = nn.GRU(input_size=300, hidden_size=400, num_layers=2, batch_first=True)

with torch.no_grad():
    # output is all hidden states, for each element in the batch of the last layer in the RNN
    # a is the last hidden state of the first layer
    # b is the last hidden state of the second (last) layer
    output, (a, b) = gru(inputs)

如果我们打印出形状，它们将证实我们的理解：

print(output.shape) # torch.Size([128, 5, 400])
print(a.shape) # torch.Size([128, 400])
print(b.shape) # torch.Size([128, 400])

此外，我们可以测试从output获得的最后一层的批次中每个元素的最后隐藏状态是否等于b：

np.testing.assert_almost_equal(b.numpy(), output[:,:-1,:].numpy())

最后，我们可以创建一个3层的RNN，并运行相同的测试：

gru = nn.GRU(input_size=300, hidden_size=400, num_layers=3, batch_first=True)
with torch.no_grad():
    output, (a, b, c) = gru(inputs)

np.testing.assert_almost_equal(c.numpy(), output[:,-1,:].numpy())

同样，断言通过，但仅当我们对c执行断言时，断言现在是RNN的最后一层。否则：

np.testing.assert_almost_equal(b.numpy(), output[:,-1,:].numpy())

引发错误：

AssertionError：数组几乎不等于7个小数位

我希望这对您来说很清楚。

隐藏状态张量的顺序与返回的张量的顺序不同

1 个答案: