NLP多类别分类器的损失不会减少

时间:2019-08-19 02:54:04

标签: python nlp pytorch multilabel-classification

我正在为质量检查机器人建立分类器,并且具有 8k个问题和149个不同答案的数据集。

训练模型时遇到一些问题; “损失”不会像我预期的那样下降,所以我正在寻求您的帮助...

这是我的方法:

我使用word2vec来获取单词的向量,然后使用基于GRU的网络来获取句子的向量。w2v模型已经用Wiki数据进行了训练,并且在我的另一个NLP项目中也能很好地工作。 GRU代码来自我的上级,我认为它也能很好地工作。

# Part of the code for getting sentence vector
input_size = 400 
hidden_dim = 400  
num_layers = 1
gru = nn.GRU(input_size, hidden_dim,num_layers,batch_first = True)

h0 = torch.rand(num_layers, 7187, hidden_dim) # (num_layers, batch, hidden_dim) 
# shape of input [dataset_len,max_sentence_len,input_feature]
inputSet = torch.tensor(x_train,dtype = torch.float)
sentenceVecs, hidden = gru(inputSet,h0)
sentenceVecs = sentenceVecs[:,-1, :] 

这是我的分类器模型

from argparse import Namespace
args = Namespace(
    dataset_file = 'dataset/waimai_10k_tw.pkl',
    model_save_path='torchmodel/pytorch_bce.model',
    # Training hyper parameters
    batch_size = 100,
    learning_rate = 0.002,
    min_learning_rate = 0.002,
    num_epochs=200,
)

class JWP(nn.Module):
    def __init__(self, 
                 n_feature, 
                 n_hidden,
                 n_hidden2,
                 n_hidden3,
                 n_output):

        super(JWP, self).__init__()
        self.hidden = nn.Linear(n_feature, n_hidden)
        self.hidden2 = nn.Linear(n_hidden, n_hidden2)
        self.hidden3 = nn.Linear(n_hidden2, n_hidden3)
        self.out = nn.Linear(n_hidden3, n_output)

    def forward(self, x, apply_softmax=False):
        x = F.relu(self.hidden(x).squeeze())
        x = F.relu(self.hidden2(x).squeeze())
        x = F.relu(self.hidden3(x).squeeze())
        # 
        if(apply_softmax):
            x = torch.softmax(self.out(x))
        else:
            x = self.out(x)

        return x

培训代码

lr = args.learning_rate
min_lr = args.min_learning_rate
def adjust_learning_rate(optimizer, epoch):
    global lr
    if epoch % 10 == 0 and epoch != 0:
        lr = lr * 0.65
        if(lr < min_lr):
            lr = min_lr
        for param_group in optimizer.param_groups:
            param_group['lr'] = lr

if __name__ == "__main__":
    EPOCH = args.num_epochs
    net = JWP(400,325,275,225,149)
#     net = JWP(400,250,149)
#     net = JWP(400,149)
    print(net)

    optimizer = torch.optim.SGD(net.parameters(), lr=lr)
    loss_func = torch.nn.CrossEntropyLoss()

    for t in range(EPOCH):
        adjust_learning_rate(optimizer,t)

        """
        Train phase
        """
        net.train() 
        TrainLoss = 0.0
        # Train batch
        for step,(batchData, batchTarget) in enumerate(trainDataLoader):
            optimizer.zero_grad() 
            out = net(batchData)
            loss = loss_func(out,batchTarget) 
            TrainLoss = TrainLoss + loss
            loss.backward() 
            optimizer.step() 
        TrainLoss = TrainLoss / (step+1) # epoch loss

        """
        Result
        """
        print(
            "epoch:",t+1 ,
            "train_loss:",round(TrainLoss.item(),3),
            "LR:",lr
        )

是我的模型太简单还是只是使用了错误的方法? 损失始终停留在4.6左右,我再也不能降低它了……

epoch: 2898 train_loss: 4.643 LR: 0.002
epoch: 2899 train_loss: 4.643 LR: 0.002
epoch: 2900 train_loss: 4.643 LR: 0.002
epoch: 2901 train_loss: 4.643 LR: 0.002

1 个答案:

答案 0 :(得分:0)

我自己回来了: 我的数据集中有一些“很长的句子”,可能会产生一些“噪音”,因为RNN(GRU)方法难以保留这些信息,因此,从我的数据集中拿走了“很长的句子”之后,损失开始下降,acc好看的