我正在使用 BERT 模型训练评论数据集,并使用 2 个密集层进行微调。然而,训练损失并没有随着时代的增加而减少。下面是模型架构和训练代码的代码:
bert = BertModel.from_pretrained("bert-base-uncased",num_labels = len(label_dict),output_attentions = False, output_hidden_states = False)
# freeze all the parameters
for param in bert.parameters():
param.requires_grad = False
class bertModel(nn.Module):
def __init__(self, bert):
super(bertModel, self).__init__()
self.bert = bert
self.dropout1 = nn.Dropout(0.1)
self.relu = nn.ReLU()
self.fc1 = nn.Linear(self.bert.config.hidden_size, 512)
self.fc2 = nn.Linear(512, 2)
self.softmax = nn.LogSoftmax(dim = 1)
def forward(self, **inputs):
_, x = self.bert(**inputs)
x = self.fc1(x)
x = self.relu(x)
x = self.dropout1(x)
x = self.fc2(x)
x = self.softmax(x)
return x
我使用了批量大小为 256 的 adam 优化器,lr = 0.00001。下面的链接有完整的代码。 https://github.com/gprashmi/Sentiment_Analysis/blob/main/sentiment_analysis_modelling1.ipynb
谁能帮我看看如何减少损失?
更新第 1 期的样本训练损失:
Epoch: 1
Batch: 0, Training Loss: 0.046884429454803464
Batch: 25, Training Loss: 1.6009380578994752
Batch: 50, Training Loss: 1.6312992066144942
Batch: 75, Training Loss: 1.6702612936496735
Batch: 100, Training Loss: 1.614540034532547
Batch: 125, Training Loss: 1.6372735381126404
End of Epoch 1, Avg. Training Loss: 7.984550286616598, Avg. validation Loss: 15.047848328666866