如何在Stanford CoreNLP中获得短语标签?

时间:2013-01-17 06:48:14

标签: nlp stanford-nlp phrase parse-tree

如果我想让每个单词对应的短语标签,我该怎么做?

例如:

在这句话中,

  

我的狗也喜欢吃香肠。

我可以在斯坦福NLP获得一个解析树,例如

(ROOT (S (NP (PRP$ My) (NN dog)) (ADVP (RB also)) (VP (VBZ likes) (NP (JJ eating) (NN sausage))) (. .)))

在上面的假设中,我希望得到与每个单词相对应的短语标签,如

(My - NP), (dog - NP), (also - ADVP), (likes - VP), ...

是否有任何方法可以对短语标签进行简单的提取?

请帮帮我。

1 个答案:

答案 0 :(得分:2)

//I guess this is how you get your parse tree.
Tree tree = sentAnno.get(TreeAnnotation.class);

//The children of a Tree annotation is an array of trees.
Tree[] children = parent.children() 

//Check the label of any sub tree to see whether it is what you want (a phrase)
for (Tree child: children){
   if (child.value().equals("NP")){// set your rule of defining Phrase here
          List<Tree> leaves = child.getLeaves(); //leaves correspond to the tokens
          for (Tree leaf : leaves){ 
            List<Word> words = leaf.yieldWords();
            for (Word word: words)
                System.out.print(String.format("(%s - NP),",word.word()));
          }
   }
}

代码没有经过全面测试,但我认为它大致可以满足您的需求。更重要的是我没有写任何有关递归访问子树的内容,但我相信你应该能够做到这一点。