Question

我有两个文本数据集，每个文本都有关于文本的问题和第一个问题的答案，有时还有没有答案的问题in the second one。对于每个问题，我都尝试通过依存关系分析来使用SpaCy en_nlp找到问题的根源：

例如，'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?'在驯服后：

>>>[to_nltk_tree(sent.root).pretty_print()  for sent in en_nlp(predicted.iloc[0,"question"]).sents]

                  appear                             
  __________________|____________________________     
 |      |      |    |         |           |      in  
 |      |      |    |         |           |      |    
 |      |      |    To       Mary         in   France
 |      |      |    |      ___|_____      |      |    
did allegedly  ?   whom  the      Virgin 1858 Lourdes

然后我尝试获取文本根：

for sent in doc.sents:
    roots = [st.stem(chunk.root.head.text.lower()) for chunk in sent.noun_chunks]
    print(roots)

['has', 'has']
['atop', 'is', 'of']
['in', 'of', 'fac', 'is', 'of', 'with', 'with', 'legend']
['to', 'is', 'of']
['behind', 'is', 'grotto', 'of', 'pray']
['is', 'is', 'of', 'at', 'lourd', 'appear', 'to']
['at', 'of', 'in', 'through', 'statu', 'is', 'of']

最后，我尝试从一个匹配到另一个找到一个答案。

如您所见on this attempt on the first dataset，该想法的准确率达到40％。但是准确率甚至下降到29％on the new dataset。

为什么依赖项解析效率的根匹配如此之低？

为什么依赖项解析的根匹配效率如此之低？

0 个答案: