我正在为质量检查机器人建立分类器,并且具有 8k个问题和149个不同答案的数据集。
训练模型时遇到一些问题; “损失”不会像我预期的那样下降,所以我正在寻求您的帮助...
我使用word2vec来获取单词的向量,然后使用基于GRU的网络来获取句子的向量。w2v模型已经用Wiki数据进行了训练,并且在我的另一个NLP项目中也能很好地工作。 GRU代码来自我的上级,我认为它也能很好地工作。
# Part of the code for getting sentence vector
input_size = 400
hidden_dim = 400
num_layers = 1
gru = nn.GRU(input_size, hidden_dim,num_layers,batch_first = True)
h0 = torch.rand(num_layers, 7187, hidden_dim) # (num_layers, batch, hidden_dim)
# shape of input [dataset_len,max_sentence_len,input_feature]
inputSet = torch.tensor(x_train,dtype = torch.float)
sentenceVecs, hidden = gru(inputSet,h0)
sentenceVecs = sentenceVecs[:,-1, :]
这是我的分类器模型
from argparse import Namespace
args = Namespace(
dataset_file = 'dataset/waimai_10k_tw.pkl',
model_save_path='torchmodel/pytorch_bce.model',
# Training hyper parameters
batch_size = 100,
learning_rate = 0.002,
min_learning_rate = 0.002,
num_epochs=200,
)
class JWP(nn.Module):
def __init__(self,
n_feature,
n_hidden,
n_hidden2,
n_hidden3,
n_output):
super(JWP, self).__init__()
self.hidden = nn.Linear(n_feature, n_hidden)
self.hidden2 = nn.Linear(n_hidden, n_hidden2)
self.hidden3 = nn.Linear(n_hidden2, n_hidden3)
self.out = nn.Linear(n_hidden3, n_output)
def forward(self, x, apply_softmax=False):
x = F.relu(self.hidden(x).squeeze())
x = F.relu(self.hidden2(x).squeeze())
x = F.relu(self.hidden3(x).squeeze())
#
if(apply_softmax):
x = torch.softmax(self.out(x))
else:
x = self.out(x)
return x
培训代码
lr = args.learning_rate
min_lr = args.min_learning_rate
def adjust_learning_rate(optimizer, epoch):
global lr
if epoch % 10 == 0 and epoch != 0:
lr = lr * 0.65
if(lr < min_lr):
lr = min_lr
for param_group in optimizer.param_groups:
param_group['lr'] = lr
if __name__ == "__main__":
EPOCH = args.num_epochs
net = JWP(400,325,275,225,149)
# net = JWP(400,250,149)
# net = JWP(400,149)
print(net)
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
loss_func = torch.nn.CrossEntropyLoss()
for t in range(EPOCH):
adjust_learning_rate(optimizer,t)
"""
Train phase
"""
net.train()
TrainLoss = 0.0
# Train batch
for step,(batchData, batchTarget) in enumerate(trainDataLoader):
optimizer.zero_grad()
out = net(batchData)
loss = loss_func(out,batchTarget)
TrainLoss = TrainLoss + loss
loss.backward()
optimizer.step()
TrainLoss = TrainLoss / (step+1) # epoch loss
"""
Result
"""
print(
"epoch:",t+1 ,
"train_loss:",round(TrainLoss.item(),3),
"LR:",lr
)
是我的模型太简单还是只是使用了错误的方法? 损失始终停留在4.6左右,我再也不能降低它了……
epoch: 2898 train_loss: 4.643 LR: 0.002
epoch: 2899 train_loss: 4.643 LR: 0.002
epoch: 2900 train_loss: 4.643 LR: 0.002
epoch: 2901 train_loss: 4.643 LR: 0.002
答案 0 :(得分:0)
我自己回来了: 我的数据集中有一些“很长的句子”,可能会产生一些“噪音”,因为RNN(GRU)方法难以保留这些信息,因此,从我的数据集中拿走了“很长的句子”之后,损失开始下降,acc好看的