Question

我正在训练一个基于Pytorch版本的Siamese网络。这是网络的构建：

class LSTMEncoder(nn.Module):
""" Implements the network type integrated within the Siamese RNN architecture. """
def __init__(self, vocab_size, opt, is_train=False):
    super(LSTMEncoder, self).__init__()
    self.vocab_size = vocab_size
    self.opt = opt
    self.name = 'sim_encoder'

    # Layers
    self.embedding_table = nn.Embedding(num_embeddings=self.vocab_size, embedding_dim=self.opt.embedding_dims,
                                        padding_idx=0, max_norm=None, scale_grad_by_freq=False, sparse=False)
    self.lstm_rnn = nn.LSTM(input_size=self.opt.embedding_dims, hidden_size=self.opt.hidden_dims, num_layers=1)

def initialize_hidden_plus_cell(self, batch_size):
    """ Re-initializes the hidden state, cell state, and the forget gate bias of the network. """
    zero_hidden = Variable(torch.randn(1, batch_size, self.opt.hidden_dims))
    zero_cell = Variable(torch.randn(1, batch_size, self.opt.hidden_dims))
    return zero_hidden, zero_cell

def forward(self, batch_size, input_data, hidden, cell):
    """ Performs a forward pass through the network. """
    output = self.embedding_table(input_data).view(1, batch_size, -1)
    for _ in range(self.opt.num_layers):
        output, (hidden, cell) = self.lstm_rnn(output, (hidden, cell))
    return output, hidden, cell

经过一些迭代训练后，嵌入层会突然丢失其矢量值，如下所示：

Epoch: 0 | Training Batch: 169 | Average loss since batch 168: 7.6317
Epoch: 0 | Training Batch: 170 | Average loss since batch 169: 10.4514
Epoch: 0 | Training Batch: 171 | Average loss since batch 170: nan
Epoch: 0 | Training Batch: 172 | Average loss since batch 171: nan
Epoch: 0 | Training Batch: 173 | Average loss since batch 172: nan

我追溯到代码中，发现nn.embedding层返回NaN，导致损失变为NaN。我不知道我在代码中犯了什么错误，我认为嵌入层在初始化后应该是一个const矩阵，对吗？初始化嵌入层后，我不复制任何值。

有人可以帮我解决这个问题吗？

为什么嵌入层在一些迭代训练后返回NaN

0 个答案: