我正在尝试使用自定义实体在SpaCy中训练新模型,并且在运行它时遇到问题。
我只有一个管道(ner),并且将所有实体类型添加为标签。
我发现向ner管道添加很多不同的标签(〜219个标签)会使它在第一个nlp.update
(Process finished with exit code -1073740791 (0xC0000409)
)上崩溃
我正在使用Python 3.7的Windows 10上的16gb RAM笔记本电脑上运行Spacy版本:2.0.12。知道为什么它在第一个nlp.update执行时崩溃,我添加了更多标签,如何防止这种情况发生?我只尝试了约100个标签,效果很好。
这是我的代码:
def __train_model(self, spacy_model, entity_types):
nlp = spacy.blank("en")
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner)
for entity_type in list(entity_types):
ner.add_label(entity_type)
optimizer = nlp.begin_training()
# Start training
for i in range(20):
losses = {}
index = 0
random.shuffle(spacy_model)
for statement, entities in spacy_model:
nlp.update([statement], [entities], sgd=optimizer, losses=losses, drop=0.5)
return nlp
spacy_model:
[
('Simply put I see no other conclusion than Comcast has actively blocked our Smart TVs from accessing Netflix on purpose.', {'entities': [(42, 49, 'ORGANIZATION:SERVICE_PROVIDER:COMMUNICATIONS'), (75, 80, 'DEVICE:COMMUNICATIONS:TV:FEATURE'), (100, 107, 'ORGANIZATION:SERVICE_PROVIDER:COMMUNICATIONS')]})
...
]
编辑:我在具有24Gb RAM和2个核心的Ubuntu 18.04 VM上尝试过,并遇到以下错误:
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)
编辑2:在此处修复:https://github.com/explosion/spaCy/issues/2800#issuecomment-425057478