我正在使用pytorch构建文本分类器,并使用.cuda()方法遇到了一些问题。我知道.cuda()会将所有参数移动到gpu中,以便训练过程更快。但是,.cuda()方法中出现错误,如下所示:
start_time = time.time()
for model_type in ('lstm',):
hyperparam_combinations = score_util.all_combination(hyperparam_dict[model_type].values())
# for selecting best scoring model
for test_idx, setting in enumerate(hyperparam_combinations):
args = custom_dataset.list_to_args(setting,model_type=model_type)
print(args)
tsv = "test %d\ttrain_loss\ttrain_acc\ttrain_auc\tval_loss\tval_acc\tval_auc\n"%(test_idx) # tsv record
avg_score = [] # cv_mean score
### 4 fold cross validation
for cv_num,(train_iter,val_iter) in enumerate(cv_splits):
### model initiation
model = model_dict[model_type](args)
if args.emb_type is not None: # word embedding init
emb = emb_dict[args.emb_type]
emb = score_util.embedding_init(emb,tr_text_field,args.emb_type)
model.embed.weight.data.copy_(emb)
model.cuda()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-20-ff6cfce73c10> in <module>()
23 model.embed.weight.data.copy_(emb)
24
---> 25 model.cuda()
26
27 optimizer= torch.optim.Adam(model.parameters(),lr=args.lr)
~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in cuda(self, device_id)
145 copied to that device
146 """
--> 147 return self._apply(lambda t: t.cuda(device_id))
148
149 def cpu(self, device_id=None):
~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
116 def _apply(self, fn):
117 for module in self.children():
--> 118 module._apply(fn)
119
120 for param in self._parameters.values():
~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
122 # Variables stored in modules are graph leaves, and we don't
123 # want to create copy nodes, so we have to unpack the data.
--> 124 param.data = fn(param.data)
125 if param._grad is not None:
126 param._grad.data = fn(param._grad.data)
RuntimeError: Variable data has to be a tensor, but got torch.cuda.FloatTensor
这些是错误追溯,我不明白为什么会发生这种情况。 在将epoch参数设置为1以运行某些测试之前,此代码运行良好。我再次将纪元设置为1000,但问题依然存在。 Aren&#t; torch.cuda.FloatTensor对象也是Tensors?任何帮助将不胜感激。
我的模型看起来像这样:
class TR_LSTM(nn.Module):
def __init__(self,args,
use_hidden_average=False,
pretrained_emb = None):
super(TR_LSTM,self).__init__()
# arguments
self.emb_dim = args.embed_dim
self.emb_num = args.embed_num
self.num_hidden_unit = args.hidden_state_dim
self.num_lstm_layer = args.num_lstm_layer
self.use_hidden_average = use_hidden_average
self.batch_size = args.batch_size
# layers
self.embed = nn.Embedding(self.emb_num, self.emb_dim)
if pretrained_emb is not None:
self.embed.weight.data.copy_(pretrained_emb)
self.lstm_layer = nn.LSTM(self.emb_dim, self.num_hidden_unit, self.num_lstm_layer, batch_first = True)
self.fc_layer = nn.Sequential(nn.Linear(self.num_hidden_unit,self.num_hidden_unit),
nn.Linear(self.num_hidden_unit,2))
def forward(self,x):
x = self.embed(x) # batch * max_seq_len * emb_dim
h_0,c_0 = self.init_hidden(x.size(0))
x, (_, _) = self.lstm_layer(x, (h_0,c_0)) # batch * seq_len * hidden_unit_num
if not self.use_hidden_average:
x = x[:,x.size(1)-1,:]
x = x.squeeze(1)
else:
x = x.mean(1).squeeze(1)
x = self.fc_layer(x)
return x
def init_hidden(self,batch_size):
h_0, c_0 = torch.zeros(self.num_lstm_layer,batch_size , self.num_hidden_unit),\
torch.zeros(self.num_lstm_layer,batch_size , self.num_hidden_unit)
h_0, c_0 = h_0.cuda(), c_0.cuda()
h_0_param, c_0_param = torch.nn.Parameter(h_0), torch.nn.Parameter(c_0)
return h_0_param, c_0_param
答案 0 :(得分:3)
在训练/测试循环中调用model.cuda(),这就是问题所在。正如错误消息所示,您反复将模型中的参数(张量)转换为cuda,这不是将模型转换为cuda张量的正确方法。
应该创建模型对象并在循环外部进行cuda-ize。每次喂食模型时,只有训练/测试实例才能转换为cuda张量。我还建议您从pytorch文档站点阅读examples code。