我正在尝试向 spacy 添加一个新的命名实体,但我无法为 ner 训练提供示例对象的好示例,并且出现值错误。 这是我的代码:
import spacy
from spacy.util import minibatch, compounding
from pathlib import Path
from spacy.training import Example
nlp=spacy.load('en_core_web_lg')
ner=nlp.get_pipe("ner")
TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[0,2,'CRORG']}),
('we stand with ABC',{'entities':[24,26,'CRORG']}),
('we supports ABC',{'entities':[15,17,'CRORG']})]
ner.add_label('CRORG')
# Disable pipeline components that dont need to change
pipe_exceptions = ["ner"]
unaffected_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
with nlp.disable_pipes(*unaffected_pipes):
for iteration in range(30):
random.shuffle(TRAIN_DATA)
for raw_text,entity_offsets in TRAIN_DATA:
doc=nlp.make_doc(raw_text)
nlp.update([Example.from_dict(doc,entity_offsets)])
答案 0 :(得分:2)
'entitites'
中的 TRAIN_DATA
应该是一个元组列表。它们必须是 2D 的,而不仅仅是 1D。
所以代替:
TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[0,2,'CRORG']}),
('we stand with ABC',{'entities':[24,26,'CRORG']}),
('we supports ABC',{'entities':[15,17,'CRORG']})]
使用:
TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[(0,2,'CRORG')]}),
('we stand with ABC',{'entities':[(24,26,'CRORG')]}),
('we supports ABC',{'entities':[(15,17,'CRORG')]})]