为什么依赖项解析的根匹配效率如此之低?

时间:2018-08-22 00:13:53

标签: tree nlp nltk spacy

我有两个文本数据集,每个文本都有关于文本的问题和第一个问题的答案,有时还有没有答案的问题in the second one。对于每个问题,我都尝试通过依存关系分析来使用SpaCy en_nlp找到问题的根源:

例如,'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?'在驯服后:

>>>[to_nltk_tree(sent.root).pretty_print()  for sent in en_nlp(predicted.iloc[0,"question"]).sents]

                  appear                             
  __________________|____________________________     
 |      |      |    |         |           |      in  
 |      |      |    |         |           |      |    
 |      |      |    To       Mary         in   France
 |      |      |    |      ___|_____      |      |    
did allegedly  ?   whom  the      Virgin 1858 Lourdes

然后我尝试获取文本根:

for sent in doc.sents:
    roots = [st.stem(chunk.root.head.text.lower()) for chunk in sent.noun_chunks]
    print(roots)

['has', 'has']
['atop', 'is', 'of']
['in', 'of', 'fac', 'is', 'of', 'with', 'with', 'legend']
['to', 'is', 'of']
['behind', 'is', 'grotto', 'of', 'pray']
['is', 'is', 'of', 'at', 'lourd', 'appear', 'to']
['at', 'of', 'in', 'through', 'statu', 'is', 'of']

最后,我尝试从一个匹配到另一个找到一个答案。

如您所见on this attempt on the first dataset,该想法的准确率达到40%。但是准确率甚至下降到29%on the new dataset

为什么依赖项解析效率的根匹配如此之低?

0 个答案:

没有答案