训练Spacy v2.0.11的“ en_core_web_sm”模型失败

时间:2019-04-01 13:54:53

标签: nlp spacy ner

我正在尝试使用以下代码训练en_core_web_sm模型以添加新实体EMAIL

LABEL = "EMAIL"
    TRAIN_DATA = [
        (
            "My email address is XXXX@gmail.com",
            {"entities": [(20, 37, LABEL)]},
        ),
        ("you can email me @ XXXXX@ai.xXx.com?", {"entities": [(19, 36, LABEL)]}),
        (
            "contact me @ XXXX@ai.xXX.com",
            {"entities": [(13, 31, LABEL)]},
        ),
    ("you can contact me at xxXX@xxXXX.com", {"entities": [(22, 56, LABEL)]})
    ]

def main(model="en_core_web_sm", new_model_name="en_core_web_sm", output_dir="D:/Train_ai", n_iter=8):
    random.seed(0)
    if model is not None:
        nlp = spacy.load('en_core_web_sm')  
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank("en") 
        print("Created blank 'en' model")
    if "ner" not in nlp.pipe_names:
        ner = nlp.create_pipe("ner")
        nlp.add_pipe(ner)
    else:
        ner = nlp.get_pipe("ner")

    ner.add_label(LABEL)  
    ner.add_label("VEGETABLE")
    if model is None:
        optimizer = nlp.begin_training()
    else:
        optimizer = nlp.resume_training()   

我得到的错误是:

  

AttributeError:第optimizer = nlp.resume_training()行上的'English'对象没有属性'resume_training'“

2 个答案:

答案 0 :(得分:1)

here所述,resume_training属性仅在spaCy v2.1.x中添加。您似乎正在运行的是旧版本v2.0.11。因此,您要么必须升级spaCy安装,要么重写代码以不使用resume_training。要查看给定版本的代码示例,可以导航到GitHub上的相应标签。例如,see here是最新的v2.0.x的代码示例。

答案 1 :(得分:1)

@Ines指出,Spacy 2.0.x不支持resume_training。因此,要进行微调/添加新实体/恢复培训,只需替换以下行:

optimizer =  nlp.resume_training()

有这行:

optimizer = nlp.entity.create_optimizer()

然后nlp.update()在实际开始训练的最后一个位置,将其传递给sgd参数,如下所示:

nlp.update(
                texts,  # batch of texts
                annotations,  # batch of annotations
                sgd=optimizer,
                drop=0,   # dropout - make it harder to memorise data
                losses=losses, 
            )