我目前正在研究NER项目,我想通过尝试新的SpaCy模型en_trf_bertbaseuncased_lg
来提高NER性能,但它给了我错误KeyError: "[E001] No component 'trf_tok2vec' found in pipeline. Available names: ['ner']"
。 SpaCy当前是否不支持该语言模型的NER?谢谢!
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): # only train NER
for itn in tqdm(range(n_iter)):
random.shuffle(train_data_list)
losses = {}
# batch up the examples using spaCy's minibatch
batches = minibatch(train_data_list, size=compounding(8., 64., 1.001))
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(texts, annotations, sgd=optimizer, drop=0.35,
losses=losses)
tqdm.write('Iter: ' + str(itn + 1) + ' Losses: ' + str(losses['ner']))
if itn == 30 or itn == 40:
output_dir = Path(output_dir)
if not output_dir.exists():
output_dir.mkdir()
nlp.to_disk(Path(output_dir))
它在
上给出了错误nlp.update(texts, annotations, sgd=optimizer, drop=0.35,
losses=losses)
答案 0 :(得分:2)
根据该模型在spaCy here上的文档,该模型尚不支持命名实体识别。它仅支持:
sentencizer
trf_wordpiecer
trf_tok2vec
您可以像这样获得给定模型的可用管道:
>>> import spacy
>>> nlp = spacy.load("en_trf_bertbaseuncased_lg")
>>> nlp.pipe_names
[sentencizer, trf_wordpiecer, trf_tok2vec]