# Create a blank 'en' model
nlp = spacy.blank("en")
# Create a new entity recognizer and add it to the pipeline
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner)
# Add a new label
ner.add_label('LABEL')
# Start the training
optimizer = nlp.begin_training()
# Loop for 10 iterations
for itn in range(10):
# Shuffle the training data
random.shuffle(spacy_train)
losses = {}
# Batch the examples and iterate over them
for batch in spacy.util.minibatch(spacy_train, size=spacy.util.compounding(4.0, 32.0, 1.001)):
texts = [text for text, entities in batch]
annotations = [entities for text, entities in batch]
print("epoch: {} Losses: {}".format(itn, str(losses)))
# Update the model
nlp.update(texts, annotations, drop=0.5, losses=losses, sgd=optimizer)
我正在尝试训练spaCy NER模型,并遵循spaCy训练指南。随着时间的流逝,损失应该减少,但是随着每个时期的不断增加。我尝试调整批处理大小和迭代无济于事。
示例:
时代:0损失:{}
时代:0损失:{'ner':37.49999785423279}
时代:0损失:{'ner':72.21390223503113}
时代:0损失:{'ner':93.70724439620972}
时代:0失败:{'ner':124.94790315628052}
时代:0损失:{'ner':164.6911883354187}
时代:0损失:{'ner':182.06093049049377}
时代:0损失:{'ner':200.32691740989685}
时代:0失败:{'ner':210.71145126968622}
时代:0失败:{'ner':222.89578241482377}
时代:0损失:{'ner':233.59122055233456}
时代:0损失:{'ner':245.26212133839726}
时代:0损失:{'ner':258.0684297736734}
对于该时期的最后一批,损失为11,000。感谢帮助。