如何使用stanford NLP解析Penn Tree Bank并获取所有子树?

时间:2016-05-26 00:32:52

标签: java parsing nlp stanford-nlp

有没有办法解析下面的PTB树来获取所有子树 例如:

Text   :  Today is a nice day.
PTB : (3 (2 Today) (3 (3 (2 is) (3 (2 a) (3 (3 nice) (2 day)))) (2 .)))

需要所有子树

Output  : 
(3 (2 Today) (3 (3 (2 is) (3 (2 a) (3 (3 nice) (2 day)))) (2 .)))
(2 Today)
(3 (3 (2 is) (3 (2 a) (3 (3 nice) (2 day)))) (2 .))
(3 (2 is) (3 (2 a) (3 (3 nice) (2 day))))
(3 (2 is) (3 (2 a) (3 (3 nice) (2 day))))
(2 is)
(3 (2 a) (3 (3 nice) (2 day)))
(2 a)
(3 (3 nice) (2 day))
(3 nice)
(2 day)
(2 .)

1 个答案:

答案 0 :(得分:1)

此演示的输入文件应该是每行树的一个字符串表示形式。此示例打印出第一棵树的子树。

Stanford CoreNLP感兴趣的课程是Tree。

import edu.stanford.nlp.trees.*;

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.io.*;

public class TreeLoadExample {

    public static void printSubTrees(Tree t) {
        if (t.isLeaf())
            return;
        System.out.println(t);
        for (Tree subTree : t.children()) {
            printSubTrees(subTree);
        }
    }


    public static void main(String[] args) throws IOException, FileNotFoundException,
            UnsupportedEncodingException {
        TreeFactory tf = new LabeledScoredTreeFactory();
        Reader r = new BufferedReader(new InputStreamReader(new FileInputStream(args[0]), "UTF-8"));
        TreeReader tr = new PennTreeReader(r, tf);
        Tree t = tr.readTree();
        printSubTrees(t);
    }
}