如何在nlp中解析时检索子树

时间:2017-12-14 12:24:57

标签: python parsing nlp nltk stanford-nlp

我想在解析句子时检索子发辫,如下所示:

sentence = "All new medications must undergo testing before they can be 
             prescribed"
parser = stanford.StanfordParser()
tree_parse = parser.raw_parse(sentence)
for i, sub_tree in enumerate(tree_parse[0].subtrees()):
   if sub_tree.label() in ["S"]:
      sub_list = sub_tree
      print(sub_list)

我期待的是单独访问标记为“S”的子树,如下所示:

第一个子树

(S
  (NP (DT All) (JJ new) (NNS medications))
  (VP
    (MD must)
    (VP
      (VB undergo)

第二个子树

(S
    (VP
      (VBG testing)
      (SBAR
        (IN before)

第3个子树

(S
          (NP (PRP they))
          (VP (MD can) (VP (VB be) (VP (VBN prescribed)))))))))))

但实际输出如下:

 (NP (DT All) (JJ new) (NNS medications))
  (VP
  (MD must)
  (VP
    (VB undergo)
    (S
      (VP
        (VBG testing)
        (SBAR
          (IN before)
          (S
            (NP (PRP they))
            (VP (MD can) (VP (VB be) (VP (VBN prescribed))))))))))
 How to access the sub tress individually like accessing items in a list?

1 个答案:

答案 0 :(得分:1)

您已经获得了子树:子树包含其根目录下的所有内容,因此您显示的输出被正确检索为"子树"低于顶级S。然后,您的遗嘱将输出主导"测试,然后才能开出处方",最后输出最低S,支配"它们可以被处方"。

顺便提一下,您可以通过指定filter直接获取S子树:

for sub_tree in tree_parse[0].subtrees(lambda t: t.label() == "S"):
    print(sub_tree)