pytorch model.cuda()运行时错误

时间:2018-01-23 11:08:53

标签: pytorch

我正在使用pytorch构建文本分类器,并使用.cuda()方法遇到了一些问题。我知道.cuda()会将所有参数移动到gpu中,以便训练过程更快。但是,.cuda()方法中出现错误,如下所示:

start_time  = time.time()

for model_type in ('lstm',):

    hyperparam_combinations = score_util.all_combination(hyperparam_dict[model_type].values())
     # for selecting best scoring model

    for test_idx, setting in enumerate(hyperparam_combinations):
        args = custom_dataset.list_to_args(setting,model_type=model_type)
        print(args)
        tsv = "test %d\ttrain_loss\ttrain_acc\ttrain_auc\tval_loss\tval_acc\tval_auc\n"%(test_idx) # tsv record
        avg_score = [] # cv_mean score

        ### 4 fold cross validation
        for cv_num,(train_iter,val_iter) in enumerate(cv_splits):

            ### model initiation
            model = model_dict[model_type](args)

            if args.emb_type is not None: # word embedding init
                emb = emb_dict[args.emb_type]
                emb = score_util.embedding_init(emb,tr_text_field,args.emb_type)
                model.embed.weight.data.copy_(emb)

            model.cuda()

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-20-ff6cfce73c10> in <module>()
     23                 model.embed.weight.data.copy_(emb)
     24 
---> 25             model.cuda()
     26 
     27             optimizer= torch.optim.Adam(model.parameters(),lr=args.lr)

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in cuda(self, device_id)
    145                 copied to that device
    146         """
--> 147         return self._apply(lambda t: t.cuda(device_id))
    148 
    149     def cpu(self, device_id=None):

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    116     def _apply(self, fn):
    117         for module in self.children():
--> 118             module._apply(fn)
    119 
    120         for param in self._parameters.values():

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    122                 # Variables stored in modules are graph leaves, and we don't
    123                 # want to create copy nodes, so we have to unpack the data.
--> 124                 param.data = fn(param.data)
    125                 if param._grad is not None:
    126                     param._grad.data = fn(param._grad.data)

RuntimeError: Variable data has to be a tensor, but got torch.cuda.FloatTensor

这些是错误追溯,我不明白为什么会发生这种情况。 在将epoch参数设置为1以运行某些测试之前,此代码运行良好。我再次将纪元设置为1000,但问题依然存在。 Aren&#t; torch.cuda.FloatTensor对象也是Tensors?任何帮助将不胜感激。

我的模型看起来像这样:

class TR_LSTM(nn.Module):
    def __init__(self,args,
                 use_hidden_average=False,
                 pretrained_emb = None):

        super(TR_LSTM,self).__init__()
        # arguments
        self.emb_dim = args.embed_dim
        self.emb_num = args.embed_num
        self.num_hidden_unit = args.hidden_state_dim
        self.num_lstm_layer = args.num_lstm_layer
        self.use_hidden_average = use_hidden_average
        self.batch_size = args.batch_size

        # layers
        self.embed = nn.Embedding(self.emb_num, self.emb_dim)
        if pretrained_emb is not None:
            self.embed.weight.data.copy_(pretrained_emb)

        self.lstm_layer = nn.LSTM(self.emb_dim, self.num_hidden_unit, self.num_lstm_layer, batch_first = True)
        self.fc_layer = nn.Sequential(nn.Linear(self.num_hidden_unit,self.num_hidden_unit),
                                      nn.Linear(self.num_hidden_unit,2))

    def forward(self,x):
        x = self.embed(x) # batch * max_seq_len * emb_dim
        h_0,c_0 = self.init_hidden(x.size(0))
        x, (_, _) = self.lstm_layer(x, (h_0,c_0)) # batch * seq_len * hidden_unit_num

        if not self.use_hidden_average:
            x = x[:,x.size(1)-1,:]
            x = x.squeeze(1)
        else:
            x = x.mean(1).squeeze(1)
        x = self.fc_layer(x)

        return x


    def init_hidden(self,batch_size):
        h_0, c_0 = torch.zeros(self.num_lstm_layer,batch_size , self.num_hidden_unit),\
                   torch.zeros(self.num_lstm_layer,batch_size , self.num_hidden_unit)
        h_0, c_0 = h_0.cuda(), c_0.cuda()
        h_0_param, c_0_param = torch.nn.Parameter(h_0), torch.nn.Parameter(c_0)
        return h_0_param, c_0_param

1 个答案:

答案 0 :(得分:3)

在训练/测试循环中调用model.cuda(),这就是问题所在。正如错误消息所示,您反复将模型中的参数(张量)转换为cuda,这不是将模型转换为cuda张量的正确方法。

应该创建

模型对象并在循环外部进行cuda-ize。每次喂食模型时,只有训练/测试实例才能转换为cuda张量。我还建议您从pytorch文档站点阅读examples code