尝试训练 spacy ner 管道以添加新命名实体时出现意外类型的 NER 数据

时间:2021-02-24 23:55:45

标签: nlp spacy ner

我正在尝试向 spacy 添加一个新的命名实体,但我无法为 ner 训练提供示例对象的好示例,并且出现值错误。 这是我的代码:

import spacy
from spacy.util import minibatch, compounding
from pathlib import Path
from spacy.training import Example

nlp=spacy.load('en_core_web_lg')

ner=nlp.get_pipe("ner")
TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[0,2,'CRORG']}),
           ('we stand with ABC',{'entities':[24,26,'CRORG']}),
           ('we supports ABC',{'entities':[15,17,'CRORG']})]
ner.add_label('CRORG')
# Disable pipeline components that dont need to change
pipe_exceptions = ["ner"]
unaffected_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]

with nlp.disable_pipes(*unaffected_pipes):
    for iteration in range(30):
        random.shuffle(TRAIN_DATA)
        for raw_text,entity_offsets in TRAIN_DATA:
            doc=nlp.make_doc(raw_text)
            nlp.update([Example.from_dict(doc,entity_offsets)])

Here is the error message I'm getting

1 个答案:

答案 0 :(得分:2)

'entitites' 中的 TRAIN_DATA 应该是一个元组列表。它们必须是 2D 的,而不仅仅是 1D。

所以代替:

TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[0,2,'CRORG']}),
           ('we stand with ABC',{'entities':[24,26,'CRORG']}),
           ('we supports ABC',{'entities':[15,17,'CRORG']})]

使用:

TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[(0,2,'CRORG')]}),
           ('we stand with ABC',{'entities':[(24,26,'CRORG')]}),
           ('we supports ABC',{'entities':[(15,17,'CRORG')]})]