在SpaCy中训练自定义实体会崩溃,并带有许多神经标签*已修复*

时间:2018-09-19 19:28:04

标签: spacy

我正在尝试使用自定义实体在SpaCy中训练新模型,并且在运行它时遇到问题。

我只有一个管道(ner),并且将所有实体类型添加为标签。

我发现向ner管道添加很多不同的标签(〜219个标签)会使它在第一个nlp.updateProcess finished with exit code -1073740791 (0xC0000409))上崩溃

我正在使用Python 3.7的Windows 10上的16gb RAM笔记本电脑上运行Spacy版本:2.0.12。知道为什么它在第一个nlp.update执行时崩溃,我添加了更多标签,如何防止这种情况发生?我只尝试了约100个标签,效果很好。

这是我的代码:

def __train_model(self, spacy_model, entity_types):
    nlp = spacy.blank("en")

    ner = nlp.create_pipe("ner")
    nlp.add_pipe(ner)

    for entity_type in list(entity_types):
        ner.add_label(entity_type)

    optimizer = nlp.begin_training()

    # Start training
    for i in range(20):
        losses = {}
        index = 0
        random.shuffle(spacy_model)

        for statement, entities in spacy_model:
            nlp.update([statement], [entities], sgd=optimizer, losses=losses, drop=0.5)

    return nlp

spacy_model:

[
    ('Simply put I see no other conclusion than Comcast has actively blocked our Smart TVs from accessing Netflix on purpose.', {'entities': [(42, 49, 'ORGANIZATION:SERVICE_PROVIDER:COMMUNICATIONS'), (75, 80, 'DEVICE:COMMUNICATIONS:TV:FEATURE'), (100, 107, 'ORGANIZATION:SERVICE_PROVIDER:COMMUNICATIONS')]})
    ...
]


编辑:我在具有24Gb RAM和2个核心的Ubuntu 18.04 VM上尝试过,并遇到以下错误:

*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)


编辑2:在此处修复:https://github.com/explosion/spaCy/issues/2800#issuecomment-425057478

0 个答案:

没有答案