我正在使用BERT进行命名实体识别。最初我只有18个标签,然后使用18个标签训练了模型并保存了模型。现在,我又添加了2个新标签,并且在更新之前保存的模型时,出现以下错误:
C:/w/1/s/windows/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: block: [0,0,0
], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
CUDA error: device-side assert triggered
Traceback (most recent call last):
File "C:\Users\jk2446\Desktop\jeril\repos\jk2446-phoenix\apps\utilities\utils.py", line 48, in catch_errors
return func(*args, **kwargs)
File "C:\Users\jk2446\Desktop\jeril\repos\jk2446-phoenix\apps\utilities\bert_utils.py", line 414, in start_training
loss.backward()
File "C:\Users\jk2446\AppData\Roaming\Python\Python36\site-packages\torch\tensor.py", line 166, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "C:\Users\jk2446\AppData\Roaming\Python\Python36\site-packages\torch\autograd\__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered
以下是我的代码:
model = BertForTokenClassification.from_pretrained(model_dir)
# inititalizing the model to use GPU
if torch.cuda.is_available():
__ = model.cuda()
torch.cuda.empty_cache()
# finetuning the model
FULL_FINETUNING = True
if FULL_FINETUNING:
param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'gamma', 'beta']
optimizer_grouped_parameters = [
{'params': [p for n, p in param_optimizer if not any(
nd in n for nd in no_decay)],
'weight_decay_rate': 0.01},
{'params': [p for n, p in param_optimizer if any(
nd in n for nd in no_decay)],
'weight_decay_rate': 0.0}
]
else:
param_optimizer = list(model.classifier.named_parameters())
optimizer_grouped_parameters = [
{"params": [p for n, p in param_optimizer]}]
optimizer = Adam(optimizer_grouped_parameters, lr=3e-5)
model.train()
tr_loss = 0
nb_tr_examples, nb_tr_steps = 0, 0
for step, batch in enumerate(train_dataloader):
# add batch to gpu
batch = tuple(t.to(device) for t in batch)
b_input_ids, b_input_mask, b_labels = batch
b_input_ids, b_input_mask, b_labels = b_input_ids.long(
), b_input_mask.long(), b_labels.long()
# forward pass
loss, scores = model(b_input_ids, token_type_ids=None,
attention_mask=b_input_mask, labels=b_labels)
# backward pass
loss.backward()
# track train loss
tr_loss += loss.item()
nb_tr_examples += b_input_ids.size(0)
nb_tr_steps += 1
# gradient clipping
torch.nn.utils.clip_grad_norm_(
parameters=model.parameters(), max_norm=max_grad_norm)
# update parameters
optimizer.step()
model.zero_grad()
# print train loss per epoch
train_loss = tr_loss / nb_tr_steps
print("Train loss: {}".format(train_loss))
有没有办法用新标签更新已经训练好的BERT模型?请帮助。