我是Spacy的初学者。最近,我正在使用带有小数据集的spacy制作实体识别模型。我制作了包含加拿大城市信息(例如国家/地区,城市,省,邮政地址等)的csv文件。 https://dataturks.com免费的NER标签服务可为我的行元素添加标签 他们提供了convertDataturkSpacy()方法来提供spacy兼容的json格式。 到目前为止,一切都很好,但是我越来越
TypeError:“ NoneType”对象不可迭代
这是我的摘录
import json
import logging
import spacy
import random
from spacy.util import minibatch, compounding
trainingfilename="C:/Users/codemen/Desktop/Timeseries Analytics/Canadianinfo.json"
logging.basicConfig(level=logging.INFO)
def ConvertDataturkToSpacy(trainingfilename):
try:
trainingData=[]
lines=[]
# reading file and formating part
with open(trainingfilename,'r') as f:
lines=f.readlines()
for line in lines:
data=json.loads(line)
text=data['content']
entities=[]
print('entties',entities)
for annotation in data['annotation']:
#print("Here is the thing")
point=annotation['points'][0] #single point annotation part
#print(point)
labels=annotation['label']
print("isintance",labels)
if not isinstance(labels,list):#handling both list of labels or single label
labels=[labels]
print(labels)
for label in labels:
#dataturks indices are inclusive but spacy indices are not so dealing with it by adding with +1
#print("Test here")
entities.append((point['start'],point['end']+1,label))
trainingData.append((text,{"entities":entities}))
return trainingData
except Exception as e:
logging.exception("Unable to process item" + trainingfilename +"\n"+ "errror ="+str(e))
return None
TrainingData=ConvertDataturkToSpacy(trainingfilename)
到目前为止,我已经发现我初始化的空列表实体[]不会在迭代过程中追加和更新。