向已经训练有素的BERT模型添加新标签

时间:2020-06-12 03:36:05

标签: huggingface-transformers

我正在使用BERT进行命名实体识别。最初我只有18个标签,然后使用18个标签训练了模型并保存了模型。现在,我又添加了2个新标签,并且在更新之前保存的模型时,出现以下错误:

C:/w/1/s/windows/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: block: [0,0,0
], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
CUDA error: device-side assert triggered
Traceback (most recent call last):
  File "C:\Users\jk2446\Desktop\jeril\repos\jk2446-phoenix\apps\utilities\utils.py", line 48, in catch_errors
    return func(*args, **kwargs)
  File "C:\Users\jk2446\Desktop\jeril\repos\jk2446-phoenix\apps\utilities\bert_utils.py", line 414, in start_training
    loss.backward()
  File "C:\Users\jk2446\AppData\Roaming\Python\Python36\site-packages\torch\tensor.py", line 166, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\Users\jk2446\AppData\Roaming\Python\Python36\site-packages\torch\autograd\__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered

以下是我的代码:

model = BertForTokenClassification.from_pretrained(model_dir)
# inititalizing the model to use GPU
if torch.cuda.is_available():
    __ = model.cuda()
    torch.cuda.empty_cache()

# finetuning the model
FULL_FINETUNING = True
if FULL_FINETUNING:
    param_optimizer = list(model.named_parameters())
    no_decay = ['bias', 'gamma', 'beta']
    optimizer_grouped_parameters = [
        {'params': [p for n, p in param_optimizer if not any(
            nd in n for nd in no_decay)],
         'weight_decay_rate': 0.01},
        {'params': [p for n, p in param_optimizer if any(
            nd in n for nd in no_decay)],
         'weight_decay_rate': 0.0}
    ]
else:
    param_optimizer = list(model.classifier.named_parameters())
    optimizer_grouped_parameters = [
        {"params": [p for n, p in param_optimizer]}]
optimizer = Adam(optimizer_grouped_parameters, lr=3e-5)

model.train()
tr_loss = 0
nb_tr_examples, nb_tr_steps = 0, 0
for step, batch in enumerate(train_dataloader):
    # add batch to gpu
    batch = tuple(t.to(device) for t in batch)
    b_input_ids, b_input_mask, b_labels = batch
    b_input_ids, b_input_mask, b_labels = b_input_ids.long(
    ), b_input_mask.long(), b_labels.long()
    # forward pass
    loss, scores = model(b_input_ids, token_type_ids=None,
                         attention_mask=b_input_mask, labels=b_labels)
    # backward pass
    loss.backward()
    # track train loss
    tr_loss += loss.item()
    nb_tr_examples += b_input_ids.size(0)
    nb_tr_steps += 1
    # gradient clipping
    torch.nn.utils.clip_grad_norm_(
        parameters=model.parameters(), max_norm=max_grad_norm)
    # update parameters
    optimizer.step()
    model.zero_grad()
# print train loss per epoch
train_loss = tr_loss / nb_tr_steps
print("Train loss: {}".format(train_loss))

有没有办法用新标签更新已经训练好的BERT模型?请帮助。

0 个答案:

没有答案