无法训练自定义实体

时间:2019-06-05 05:19:34

标签: nlp spacy

我正在尝试使用spacy为自定义实体编写代码,但是该模型尚未接受培训。它是在随机猜测实体,而不是从训练后的数据中推断出

数据:[('0,Alerts,“ IP地址为10.96.205.6的名为VW-PNS-0193-02的生产服务器当前已关闭。尝试将其重新启动。如果无法找到服务器,或无法将其备份,请联系at如果缺少联系信息,请使用以下默认联系人组:默认联系人组:服务器和VM主机–系统工程(706)580-6871路由器和交换机–网络工程(706)580 -6862安全设备– IT信息安全(762)207-3677 UPS –网络操作(706)641-6766“,2-高,生产节点关闭:名为VW-PNS-0193-02的生产服务器当前已关闭。 ,{'entities':[(67,78,'ip_address'),(38,52,'address'),(14,31,'production'),(2,8,'category')]})), ('1,Alerts,“位于IP地址10.96.205.7的名为VW-PNS-0193-04的生产服务器当前已关闭。尝试将其备份。如果无法找到服务器或无法对其进行备份, ,请联系,如果缺少联系信息,请请使用下面的默认联系人组。默认联系人组:服务器和VM主机–系统工程(706)580-6871路由器和交换机–网络工程(706)580-6862安全设备– IT信息安全(762)207-3677 UPS –网络操作(706)641- 6766“,2-高,生产节点关闭:名为VW-PNS-0193-04的生产服务器当前处于关闭状态。',{'实体':[(67,78,'ip_address'),((38,52,' device_name'),(14,31,'production')]})]

数据相当,但是与上面提到的类似

################### Train Spacy NER.###########
def train_spacy():
    TRAIN_DATA = convert_dataturks_to_spacy("/content/tickets_final.json");
    nlp = spacy.blank('en')  # create blank Language class
    # create the built-in pipeline components and add them to the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner, last=True)

    # add labels
    for _, annotations in TRAIN_DATA:
        for ent in annotations.get('entities'):
            ner.add_label(ent[2])
#             print(ent[2])

    # get names of other pipes to disable them during training
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):  # only train NER
        optimizer = nlp.begin_training()
        for itn in range(1):
            print("Statring iteration " + str(itn))
            random.shuffle(TRAIN_DATA)
            losses = {}
            for text, annotations in TRAIN_DATA:
#                 print(text)
#                 print(annotations)
                nlp.update(
                    [text],  # batch of texts
                    [annotations],  # batch of annotations
                    drop=0.2,  # dropout - make it harder to memorise data
                    sgd=optimizer,  # callable to update weights
                    losses=losses)
            #print("LOL")
            print(losses)

    #do prediction
    doc = nlp("Production Server named VW-PNS-0193-02 at IP Address 10.96.205.6 is currently Down")
    print ("Entities= " + str(["" + str(ent.text) + "_" + str(ent.label_) for ent in doc.ents]))```

I am expecting outputs like a category of ticket or ip address or machine name

0 个答案:

没有答案