我正在尝试使用以下代码训练en_core_web_sm
模型以添加新实体EMAIL
:
LABEL = "EMAIL"
TRAIN_DATA = [
(
"My email address is XXXX@gmail.com",
{"entities": [(20, 37, LABEL)]},
),
("you can email me @ XXXXX@ai.xXx.com?", {"entities": [(19, 36, LABEL)]}),
(
"contact me @ XXXX@ai.xXX.com",
{"entities": [(13, 31, LABEL)]},
),
("you can contact me at xxXX@xxXXX.com", {"entities": [(22, 56, LABEL)]})
]
def main(model="en_core_web_sm", new_model_name="en_core_web_sm", output_dir="D:/Train_ai", n_iter=8):
random.seed(0)
if model is not None:
nlp = spacy.load('en_core_web_sm')
print("Loaded model '%s'" % model)
else:
nlp = spacy.blank("en")
print("Created blank 'en' model")
if "ner" not in nlp.pipe_names:
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner)
else:
ner = nlp.get_pipe("ner")
ner.add_label(LABEL)
ner.add_label("VEGETABLE")
if model is None:
optimizer = nlp.begin_training()
else:
optimizer = nlp.resume_training()
我得到的错误是:
AttributeError:第
optimizer = nlp.resume_training()
行上的'English'对象没有属性'resume_training'“
答案 0 :(得分:1)
如here所述,resume_training
属性仅在spaCy v2.1.x中添加。您似乎正在运行的是旧版本v2.0.11。因此,您要么必须升级spaCy安装,要么重写代码以不使用resume_training
。要查看给定版本的代码示例,可以导航到GitHub上的相应标签。例如,see here是最新的v2.0.x的代码示例。
答案 1 :(得分:1)
@Ines指出,Spacy 2.0.x不支持resume_training
。因此,要进行微调/添加新实体/恢复培训,只需替换以下行:
optimizer = nlp.resume_training()
有这行:
optimizer = nlp.entity.create_optimizer()
然后nlp.update()
在实际开始训练的最后一个位置,将其传递给sgd
参数,如下所示:
nlp.update(
texts, # batch of texts
annotations, # batch of annotations
sgd=optimizer,
drop=0, # dropout - make it harder to memorise data
losses=losses,
)