我已经在Youtube
中遵循了本教程这是我的jupyter笔记本中的全部代码
import spacy
import fitz
import pickle
import pandas as pd
import random
train_data = pickle.load(open('train_data.pkl', 'rb'))
train_data[0]
此处显示train_data [0]的输出
nlp = spacy.blank('en')
def train_model(train_data):
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner, last = True)
for _, annotation in train_data:
for ent in annotation['entities']:
ner.add_label(ent[2])
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes):
optimizer = nlp.begin_training()
for itn in range(10):
print('Starting iteration' + str(itn))
random.shuffle(train_data)
losses = {}
index = 0
# batch up the examples using spaCy's minibatch
#batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
for text, annotations in train_data:
try:
nlp.update(
[texts], # batch of texts
[annotations],# batch of annotations
sgd=optimizer,
drop=0.5, # dropout - make it harder to memorise data
losses=losses)
except Exception as e:
pass
print("Losses", losses)
train_model(train_data)
奇怪的是函数的输出:
开始迭代0
损失{}
开始迭代1
损失{}
开始迭代2
损失{}
开始迭代3
损失{}
开始迭代4
损失{}
开始迭代5
损失{}
开始迭代6
损失{}
开始迭代7
损失{}
开始迭代8
损失{}
开始迭代9
损失{}
即使我可以运行train_data并获得输出,看来也根本没有数据输入模型!
spaCy版本2.3.0
Python版本3.7.3
答案 0 :(得分:0)
对于文本,train_data中的注释:
try:
nlp.update(
[***texts***], # batch of texts
[annotations],# batch of annotations
sgd=optimizer,
drop=0.5, # dropout - make it harder to memorise data
losses=losses).-----
将文本替换为文本