我正在训练一个基于Pytorch版本的Siamese网络。这是网络的构建:
class LSTMEncoder(nn.Module):
""" Implements the network type integrated within the Siamese RNN architecture. """
def __init__(self, vocab_size, opt, is_train=False):
super(LSTMEncoder, self).__init__()
self.vocab_size = vocab_size
self.opt = opt
self.name = 'sim_encoder'
# Layers
self.embedding_table = nn.Embedding(num_embeddings=self.vocab_size, embedding_dim=self.opt.embedding_dims,
padding_idx=0, max_norm=None, scale_grad_by_freq=False, sparse=False)
self.lstm_rnn = nn.LSTM(input_size=self.opt.embedding_dims, hidden_size=self.opt.hidden_dims, num_layers=1)
def initialize_hidden_plus_cell(self, batch_size):
""" Re-initializes the hidden state, cell state, and the forget gate bias of the network. """
zero_hidden = Variable(torch.randn(1, batch_size, self.opt.hidden_dims))
zero_cell = Variable(torch.randn(1, batch_size, self.opt.hidden_dims))
return zero_hidden, zero_cell
def forward(self, batch_size, input_data, hidden, cell):
""" Performs a forward pass through the network. """
output = self.embedding_table(input_data).view(1, batch_size, -1)
for _ in range(self.opt.num_layers):
output, (hidden, cell) = self.lstm_rnn(output, (hidden, cell))
return output, hidden, cell
经过一些迭代训练后,嵌入层会突然丢失其矢量值,如下所示:
Epoch: 0 | Training Batch: 169 | Average loss since batch 168: 7.6317
Epoch: 0 | Training Batch: 170 | Average loss since batch 169: 10.4514
Epoch: 0 | Training Batch: 171 | Average loss since batch 170: nan
Epoch: 0 | Training Batch: 172 | Average loss since batch 171: nan
Epoch: 0 | Training Batch: 173 | Average loss since batch 172: nan
我追溯到代码中,发现nn.embedding层返回NaN,导致损失变为NaN。 我不知道我在代码中犯了什么错误,我认为嵌入层在初始化后应该是一个const矩阵,对吗?初始化嵌入层后,我不复制任何值。
有人可以帮我解决这个问题吗?