使用SpaCy的自定义NER培训模型无法进行培训

时间:2020-07-06 03:36:19

标签: spacy named-entity-recognition

我已经在Youtube

中遵循了本教程

这是我的jupyter笔记本中的全部代码

import spacy
import fitz
import pickle
import pandas as pd
import random

train_data = pickle.load(open('train_data.pkl', 'rb'))
train_data[0]

此处显示train_data [0]的输出

nlp = spacy.blank('en')

def train_model(train_data):
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner, last = True)
        
    for _, annotation in train_data:
        for ent in annotation['entities']:
            ner.add_label(ent[2])
            
            
    
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):
        optimizer = nlp.begin_training()
        for itn in range(10):
            print('Starting iteration' + str(itn))
            random.shuffle(train_data)
            losses = {}
            index = 0
            # batch up the examples using spaCy's minibatch
            #batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
            for text, annotations in train_data:
                try:
                
                    nlp.update(
                        [texts],  # batch of texts
                        [annotations],# batch of annotations
                        sgd=optimizer,
                        drop=0.5,  # dropout - make it harder to memorise data
                        losses=losses)
                except Exception as e:
                    pass
            print("Losses", losses)

train_model(train_data)

奇怪的是函数的输出:

开始迭代0

损失{}

开始迭代1

损失{}

开始迭代2

损失{}

开始迭代3

损失{}

开始迭代4

损失{}

开始迭代5

损失{}

开始迭代6

损失{}

开始迭代7

损失{}

开始迭代8

损失{}

开始迭代9

损失{}

即使我可以运行train_data并获得输出,看来也根本没有数据输入模型!

spaCy版本2.3.0
Python版本3.7.3

1 个答案:

答案 0 :(得分:0)

对于文本,train_data中的注释:

try:
    nlp.update(
        [***texts***],  # batch of texts
        [annotations],# batch of annotations
        sgd=optimizer,
        drop=0.5,  # dropout - make it harder to memorise data
        losses=losses).-----
文本替换为文本