我有一个用于分类评论的神经网络。
流程如下: 对于每个注释,每个单词都被嵌入矩阵中的索引替换。所有评论都被填充以使其等于最大评论大小。对于每个单词,将嵌入矩阵送入GRU,并将先前状态的隐藏矩阵也输入到GRU。将整个句子输入到GRU中后,最终输出将输入到神经网络以返回2类中的最终概率。
pytorch中的代码:
def trainSentence(input_tensor, target_label, gru, ann, gru_optimizer, ann_optimizer, criterion):
hidden_state = gru.initHidden()
gru_optimizer.zero_grad()
ann_optimizer.zero_grad()
loss = 0
# pass the whole sentence through the rnn.
for index in range(input_tensor.shape[0]):
word_embedding = []
try:
word_embedding = torch.FloatTensor(embeddings[input_tensor[index]]).to(device)
except KeyError:
word_embedding = torch.FloatTensor(embeddings[word2idx['UNK']]).to(device)
output, hidden_state = gru(word_embedding, hidden_state)
# feed the output tensor to the ann.
y_pred = ann(output)
loss += criterion(y_pred.view(1, -1), torch.tensor(target_label, device=device).view(1, ))
loss.backward()
gru_optimizer.step()
ann_optimizer.step()
return loss.item()
网络代码:
class GRU(nn.Module):
def __init__(self, vocab_size, embedding_dims, hidden_size, num_layers):
super(GRU, self).__init__()
self.vocab_size = vocab_size
self.embedding_dims = embedding_dims
self.hidden_size = hidden_size
self.num_layers = num_layers
self.gru = nn.GRU(self.embedding_dims, self.hidden_size, num_layers=self.num_layers)
def forward(self, input, hidden_state):
input = input.view(1, 1, -1)
# dims of hidden_state = 2, 1, hidden_size (2 because 2 layers)
output, hidden_state = self.gru(input, hidden_state)
return output, hidden_state
def initHidden(self):
return torch.randn(self.num_layers, 1, self.hidden_size, device=device)
class ANN(nn.Module):
def __init__(self, input_size):
super(ANN, self).__init__()
self.fc1 = nn.Linear(input_size, 128)
self.fc2 = nn.Linear(128, 64)
self.fc3 = nn.Linear(64, 2)
def forward(self, input):
output = F.dropout(F.relu(self.fc1(input)), p=0.5)
output = F.dropout(F.relu(self.fc2(output)), p=0.5)
output = self.fc3(output)
return output
亏损始终停留在0.65至0.71之间,大部分为0.69。似乎是什么问题?
编辑: 我尝试了一些事情,发现权重也没有更新。调用optimizer.step()
可能存在一些问题答案 0 :(得分:0)
具有一个扩展Classifier
的{{1}}类,该类具有nn.Module
和GRU
的层,如下所示:
ANN
在对class Classifier(nn.Module):
def __init__(self):
self.gru = GRU(..)
self.ann = ANN(..)
def forward(self, input):
x = self.gru(input)
x = self.ann(x)
return x
classifier = Classifier()
output = classifier(input)
loss = criterion(output, target)
optimizer.step()
的输出进行损耗计算之后,只有一个优化器来执行步进功能。