我正在使用带有Pytorch库的RNN模型进行电影评论的情感分析,但是在整个训练过程中,训练损失和验证损失仍然保持不变。我查找了不同的在线资源,但仍然陷于困境。
有人可以帮忙看看我的代码吗?
某些参数由分配指定:
embedding_dim = 64
n_layers = 1
n_hidden = 128
dropout = 0.5
batch_size = 32
我的主要代码
txt_field = data.Field(tokenize=word_tokenize, lower=True, include_lengths=True, batch_first=True)
label_field = data.Field(sequential=False, use_vocab=False, batch_first=True)
train = data.TabularDataset(path=part2_filepath+"train_Copy.csv", format='csv',
fields=[('label', label_field), ('text', txt_field)], skip_header=True)
validation = data.TabularDataset(path=part2_filepath+"validation_Copy.csv", format='csv',
fields=[('label', label_field), ('text', txt_field)], skip_header=True)
txt_field.build_vocab(train, min_freq=5)
label_field.build_vocab(train, min_freq=2)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
train_iter, valid_iter, test_iter = data.BucketIterator.splits(
(train, validation, test),
batch_size=32,
sort_key=lambda x: len(x.text),
sort_within_batch=True,
device=device)
n_vocab = len(txt_field.vocab)
embedding_dim = 64
n_hidden = 128
n_layers = 1
dropout = 0.5
model = Text_RNN(n_vocab, embedding_dim, n_hidden, n_layers, dropout)
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
criterion = torch.nn.BCELoss().to(device)
N_EPOCHS = 15
best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
train_loss, train_acc = RNN_train(model, train_iter, optimizer, criterion)
valid_loss, valid_acc = evaluate(model, valid_iter, criterion)
我的模特
class Text_RNN(nn.Module):
def __init__(self, n_vocab, embedding_dim, n_hidden, n_layers, dropout):
super(Text_RNN, self).__init__()
self.n_layers = n_layers
self.n_hidden = n_hidden
self.emb = nn.Embedding(n_vocab, embedding_dim)
self.rnn = nn.RNN(
input_size=embedding_dim,
hidden_size=n_hidden,
num_layers=n_layers,
dropout=dropout,
batch_first=True
)
self.sigmoid = nn.Sigmoid()
self.linear = nn.Linear(n_hidden, 2)
def forward(self, sent, sent_len):
sent_emb = self.emb(sent)
outputs, hidden = self.rnn(sent_emb)
prob = self.sigmoid(self.linear(hidden.squeeze(0)))
return prob
训练功能
def RNN_train(model, iterator, optimizer, criterion):
epoch_loss = 0
epoch_acc = 0
model.train()
for batch in iterator:
text, text_lengths = batch.text
predictions = model(text, text_lengths)
batch.label = batch.label.type(torch.FloatTensor).squeeze()
predictions = torch.max(predictions.data, 1).indices.type(torch.FloatTensor)
loss = criterion(predictions, batch.label)
loss.requires_grad = True
acc = binary_accuracy(predictions, batch.label)
optimizer.zero_grad()
loss.backward()
optimizer.step()
epoch_loss += loss.item()
epoch_acc += acc.item()
return epoch_loss / len(iterator), epoch_acc / len(iterator)
我运行10条测试评论+ 5条验证评论的输出
Epoch [1/15]: Train Loss: 15.351 | Train Acc: 44.44% Val. Loss: 11.052 | Val. Acc: 60.00%
Epoch [2/15]: Train Loss: 15.351 | Train Acc: 44.44% Val. Loss: 11.052 | Val. Acc: 60.00%
Epoch [3/15]: Train Loss: 15.351 | Train Acc: 44.44% Val. Loss: 11.052 | Val. Acc: 60.00%
Epoch [4/15]: Train Loss: 15.351 | Train Acc: 44.44% Val. Loss: 11.052 | Val. Acc: 60.00%
...
感谢有人能指出我正确的方向,我相信培训代码对我有所帮助,因为在大多数情况下,我都遵循本文: https://www.analyticsvidhya.com/blog/2020/01/first-text-classification-in-pytorch/
答案 0 :(得分:1)
在训练循环中,您使用的是max操作的索引,该索引是不可微分的,因此无法通过它跟踪梯度。因为它是不可微分的,所以以后的所有内容也不跟踪梯度。呼唤
loss.backward()
会失败。
# The indices of the max operation are not differentiable
predictions = torch.max(predictions.data, 1).indices.type(torch.FloatTensor)
loss = criterion(predictions, batch.label)
# Setting requires_grad to True to make .backward() work, although incorrectly.
loss.requires_grad = True
大概您想通过设置requires_grad
来解决此问题,但这并没有达到您的期望,因为没有梯度传播到您的模型,因为计算图中唯一的东西就是损失本身,并且无处可去。
您使用索引来获取0或1,因为模型的输出本质上是两个类,并且您想要一个具有更高概率的类。对于Binary Cross Entropy损失,您只需要一个值在0到1(连续)之间的类,即可通过应用Sigmoid函数获得。
因此您需要将最终线性层的输出通道更改为1:
self.linear = nn.Linear(n_hidden, 1)
,并且在您的训练循环中,您可以删除torch.max
通话以及requires_grad
。
# Squeeze the model's output to get rid of the single class dimension
predictions = model(text, text_lengths).squeeze()
batch.label = batch.label.type(torch.FloatTensor).squeeze()
loss = criterion(predictions, batch.label)
acc = binary_accuracy(predictions, batch.label)
optimizer.zero_grad()
loss.backward()
由于最后只有1个班级,因此实际预测将为0或1(介于两者之间),以实现您可以简单地使用0.5作为阈值,因此下面的所有内容都被视为0,而上面的所有内容都被视为0被认为是1。如果您正在使用所关注文章的binary_accuracy
功能,这将自动为您完成。他们通过用torch.round
进行四舍五入来做到这一点。