我正在尝试使用spacy为自定义实体编写代码,但是该模型尚未接受培训。它是在随机猜测实体,而不是从训练后的数据中推断出
数据:[('0,Alerts,“ IP地址为10.96.205.6的名为VW-PNS-0193-02的生产服务器当前已关闭。尝试将其重新启动。如果无法找到服务器,或无法将其备份,请联系at如果缺少联系信息,请使用以下默认联系人组:默认联系人组:服务器和VM主机–系统工程(706)580-6871路由器和交换机–网络工程(706)580 -6862安全设备– IT信息安全(762)207-3677 UPS –网络操作(706)641-6766“,2-高,生产节点关闭:名为VW-PNS-0193-02的生产服务器当前已关闭。 ,{'entities':[(67,78,'ip_address'),(38,52,'address'),(14,31,'production'),(2,8,'category')]})), ('1,Alerts,“位于IP地址10.96.205.7的名为VW-PNS-0193-04的生产服务器当前已关闭。尝试将其备份。如果无法找到服务器或无法对其进行备份, ,请联系,如果缺少联系信息,请请使用下面的默认联系人组。默认联系人组:服务器和VM主机–系统工程(706)580-6871路由器和交换机–网络工程(706)580-6862安全设备– IT信息安全(762)207-3677 UPS –网络操作(706)641- 6766“,2-高,生产节点关闭:名为VW-PNS-0193-04的生产服务器当前处于关闭状态。',{'实体':[(67,78,'ip_address'),((38,52,' device_name'),(14,31,'production')]})]
数据相当,但是与上面提到的类似
################### Train Spacy NER.###########
def train_spacy():
TRAIN_DATA = convert_dataturks_to_spacy("/content/tickets_final.json");
nlp = spacy.blank('en') # create blank Language class
# create the built-in pipeline components and add them to the pipeline
# nlp.create_pipe works for built-ins that are registered with spaCy
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner, last=True)
# add labels
for _, annotations in TRAIN_DATA:
for ent in annotations.get('entities'):
ner.add_label(ent[2])
# print(ent[2])
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): # only train NER
optimizer = nlp.begin_training()
for itn in range(1):
print("Statring iteration " + str(itn))
random.shuffle(TRAIN_DATA)
losses = {}
for text, annotations in TRAIN_DATA:
# print(text)
# print(annotations)
nlp.update(
[text], # batch of texts
[annotations], # batch of annotations
drop=0.2, # dropout - make it harder to memorise data
sgd=optimizer, # callable to update weights
losses=losses)
#print("LOL")
print(losses)
#do prediction
doc = nlp("Production Server named VW-PNS-0193-02 at IP Address 10.96.205.6 is currently Down")
print ("Entities= " + str(["" + str(ent.text) + "_" + str(ent.label_) for ent in doc.ents]))```
I am expecting outputs like a category of ticket or ip address or machine name