持续的培训损失和验证损失

时间:2020-04-17 16:39:56

标签: machine-learning pytorch recurrent-neural-network sentiment-analysis

我正在使用带有Pytorch库的RNN模型进行电影评论的情感分析,但是在整个训练过程中,训练损失和验证损失仍然保持不变。我查找了不同的在线资源,但仍然陷于困境。

有人可以帮忙看看我的代码吗?

某些参数由分配指定:

embedding_dim = 64

n_layers = 1

n_hidden = 128

dropout = 0.5

batch_size = 32

我的主要代码

txt_field = data.Field(tokenize=word_tokenize, lower=True, include_lengths=True, batch_first=True)
label_field = data.Field(sequential=False, use_vocab=False, batch_first=True)

train = data.TabularDataset(path=part2_filepath+"train_Copy.csv", format='csv',
                            fields=[('label', label_field), ('text', txt_field)], skip_header=True)
validation = data.TabularDataset(path=part2_filepath+"validation_Copy.csv", format='csv',
                            fields=[('label', label_field), ('text', txt_field)], skip_header=True)

txt_field.build_vocab(train, min_freq=5)
label_field.build_vocab(train, min_freq=2)

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
train_iter, valid_iter, test_iter = data.BucketIterator.splits(
    (train, validation, test),
    batch_size=32,
    sort_key=lambda x: len(x.text),
    sort_within_batch=True,
    device=device)

n_vocab = len(txt_field.vocab)
embedding_dim = 64
n_hidden = 128
n_layers = 1
dropout = 0.5

model = Text_RNN(n_vocab, embedding_dim, n_hidden, n_layers, dropout)

optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
criterion = torch.nn.BCELoss().to(device)

N_EPOCHS = 15
best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):
    train_loss, train_acc = RNN_train(model, train_iter, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iter, criterion)

我的模特

class Text_RNN(nn.Module):
    def __init__(self, n_vocab, embedding_dim, n_hidden, n_layers, dropout):
        super(Text_RNN, self).__init__()
        self.n_layers = n_layers
        self.n_hidden = n_hidden
        self.emb = nn.Embedding(n_vocab, embedding_dim)
        self.rnn = nn.RNN(
            input_size=embedding_dim,
            hidden_size=n_hidden,
            num_layers=n_layers,
            dropout=dropout,
            batch_first=True
        )
        self.sigmoid = nn.Sigmoid()
        self.linear = nn.Linear(n_hidden, 2)

    def forward(self, sent, sent_len):
        sent_emb = self.emb(sent)
        outputs, hidden = self.rnn(sent_emb)
        prob = self.sigmoid(self.linear(hidden.squeeze(0)))

        return prob

训练功能

def RNN_train(model, iterator, optimizer, criterion):
    epoch_loss = 0
    epoch_acc = 0
    model.train()
    for batch in iterator:
        text, text_lengths = batch.text
        predictions = model(text, text_lengths)
        batch.label = batch.label.type(torch.FloatTensor).squeeze()
        predictions = torch.max(predictions.data, 1).indices.type(torch.FloatTensor)
        loss = criterion(predictions, batch.label)
        loss.requires_grad = True
        acc = binary_accuracy(predictions, batch.label)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
        epoch_acc += acc.item()

    return epoch_loss / len(iterator), epoch_acc / len(iterator)

我运行10条测试评论+ 5条验证评论的输出

Epoch [1/15]:   Train Loss: 15.351 | Train Acc: 44.44%  Val. Loss: 11.052 |  Val. Acc: 60.00%
Epoch [2/15]:   Train Loss: 15.351 | Train Acc: 44.44%  Val. Loss: 11.052 |  Val. Acc: 60.00%
Epoch [3/15]:   Train Loss: 15.351 | Train Acc: 44.44%  Val. Loss: 11.052 |  Val. Acc: 60.00%
Epoch [4/15]:   Train Loss: 15.351 | Train Acc: 44.44%  Val. Loss: 11.052 |  Val. Acc: 60.00%
...

感谢有人能指出我正确的方向,我相信培训代码对我有所帮助,因为在大多数情况下,我都遵循本文: https://www.analyticsvidhya.com/blog/2020/01/first-text-classification-in-pytorch/

1 个答案:

答案 0 :(得分:1)

在训练循环中,您使用的是max操作的索引,该索引是不可微分的,因此无法通过它跟踪梯度。因为它是不可微分的,所以以后的所有内容也不跟踪梯度。呼唤 loss.backward()会失败。

# The indices of the max operation are not differentiable
predictions = torch.max(predictions.data, 1).indices.type(torch.FloatTensor)
loss = criterion(predictions, batch.label)
# Setting requires_grad to True to make .backward() work, although incorrectly.
loss.requires_grad = True

大概您想通过设置requires_grad来解决此问题,但这并没有达到您的期望,因为没有梯度传播到您的模型,因为计算图中唯一的东西就是损失本身,并且无处可去。

您使用索引来获取0或1,因为模型的输出本质上是两个类,并且您想要一个具有更高概率的类。对于Binary Cross Entropy损失,您只需要一个值在0到1(连续)之间的类,即可通过应用Sigmoid函数获得。

因此您需要将最终线性层的输出通道更改为1:

self.linear = nn.Linear(n_hidden, 1)

,并且在您的训练循环中,您可以删除torch.max通话以及requires_grad

# Squeeze the model's output to get rid of the single class dimension
predictions = model(text, text_lengths).squeeze()
batch.label = batch.label.type(torch.FloatTensor).squeeze()
loss = criterion(predictions, batch.label)
acc = binary_accuracy(predictions, batch.label)
optimizer.zero_grad()
loss.backward()

由于最后只有1个班级,因此实际预测将为0或1(介于两者之间),以实现您可以简单地使用0.5作为阈值,因此下面的所有内容都被视为0,而上面的所有内容都被视为0被认为是1。如果您正在使用所关注文章的binary_accuracy功能,这将自动为您完成。他们通过用torch.round进行四舍五入来做到这一点。