我的 PyTorch 模型的训练已停止

时间:2021-04-19 07:34:39

标签: python deep-learning pytorch

我正在创建一个模型来识别 RNN (LITMS) 中的项目名称

我获取数据,然后转换数据,然后创建 Baches,然后创建模型,然后创建训练函数正确但训练在这里停止(不工作):

这是我的代码

        for e in range(epochs):
        
            # initialize hidden state
            h = net.init_hidden(batch_size)
            
            for x, y in get_batches(data, batch_size, seq_length):
                print ("the login the loob get_batches is succressfuly")
                counter += 1
                
                # One-hot encode our data and make them Torch tensors
                x = one_hot_encode(x, n_chars)
                inputs, targets = torch.from_numpy(x), torch.from_numpy(y)
                
                if(train_on_gpu):
                    inputs, targets = inputs.cuda(), targets.cuda()

                # Creating new variables for the hidden state, otherwise
                # we'd backprop through the entire training history
                h = tuple([each.data for each in h])

                # zero accumulated gradients
                net.zero_grad()
                
                # get the output from the model
                output, h = net(inputs, h)
                
                # calculate the loss and perform backprop
                loss = criterion(output, targets.view(batch_size*seq_length))
                loss.backward()
                # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
                nn.utils.clip_grad_norm_(net.parameters(), clip)
                opt.step()
                
                # loss stats
                if counter % print_every == 0:
                    # Get validation loss
                    val_h = net.init_hidden(batch_size)
                    val_losses = []
                    net.eval()
                    for x, y in get_batches(val_data, batch_size, seq_length):
                        # One-hot encode our data and make them Torch tensors
                        x = one_hot_encode(x, n_chars)
                        x, y = torch.from_numpy(x), torch.from_numpy(y)
                        
                        # Creating new variables for the hidden state, otherwise
                        # we'd backprop through the entire training history
                        val_h = tuple([each.data for each in val_h])
                        
                        inputs, targets = x, y
                        if(train_on_gpu):
                            inputs, targets = inputs.cuda(), targets.cuda()

                        output, val_h = net(inputs, val_h)
                        val_loss = criterion(output, targets.view(batch_size*seq_length))
                    
                        val_losses.append(val_loss.item())
                    
                    net.train() # reset to train mode after iterationg through validation data
                    
                    print("Epoch: {}/{}...".format(e+1, epochs),
                        "Step: {}...".format(counter),
                        "Loss: {:.4f}...".format(loss.item()),
                        "Val Loss: {:.4f}".format(np.mean(val_losses)))

我不知道为什么。

有时我在多次试验后得到这个错误 enter image description here

如果可以,请帮助我 您可以找到笔记本文件 here

1 个答案:

答案 0 :(得分:1)

你重复这一步 enter image description here

删除它然后再试