如何调整Neurocoref以获得更好的共指结果?

时间:2019-06-21 20:55:28

标签: nlp python-3.7 spacy

我正在使用Neurocoref-基于spaCy解析器的共指解析模块。 GIT https://github.com/huggingface/neuralcoref

但是,我得到的结果可以得到改善。 Neurocoref的开发人员huggingface提供的在线可视化工具为我提供了更准确的结果。

我正在分析的文本: “伦敦是英格兰和英国的首都和人口最多的城市。它位于大不列颠岛东南部的泰晤士河畔,是两千年来的主要定居点。”

我得到这个结果:

doc._.coref_resolved 
  
    
      

伦敦是英格兰和英国的首都和人口最多的城市。泰晤士河位于大不列颠岛东南部的泰晤士河上,是两千年来的主要定居点。

    
  

因此,将伦敦与泰晤士河相连是错误的。 (它->泰晤士河)

Neurocoref在线可视化工具返回正确的链接(它->伦敦)

https://huggingface.co/coref/?text=London%20is%20the%20capital%20and%20most%20populous%20city%20of%20England%20and%20the%20United%20Kingdom.%20Standing%20on%20the%20River%20Thames%20in%20the%20south%20east%20of%20the%20island%20of%20Great%20Britain%2C%20it%20has%20been%20a%20major%20settlement%20for%20two%20millennia.%20It%20was%20founded%20by%20the%20Romans%2C%20who%20named%20it%20Londinium

我已经尝试过调整项目的git页面https://github.com/huggingface/neuralcoref上提到的参数,例如贪婪度,max_dist

import spacy
nlp = spacy.load('en_core_web_lg')

import neuralcoref
neuralcoref.add_to_pipe(nlp,greedyness=0.5,store_scores=True)

text = "London is the capital and most populous city of England and   the United Kingdom. Standing on the River Thames in the south east of the island of Great Britain, it has been a major settlement for two millennia."# It was founded by the Romans, who named it Londinium."

doc = nlp(text)
print(doc._.coref_resolved)
doc._.coref_scores

是否有一种方法可以对其进行调整以得到与可视化器类似的结果?

谢谢!

0 个答案:

没有答案