我已经创建了一个使用Tregex提取子树的类。我使用了“TregexPattern.java”中的一些代码片段,因为我不想让程序使用控制台命令。
一般来说,有一个句子的树,我想提取某些子树(没有用户交互)。
到目前为止我所做的是以下内容:
package edu.stanford.nlp.trees.tregex;
import edu.stanford.nlp.ling.StringLabelFactory;
import edu.stanford.nlp.trees.*;
import java.io.*;
import java.util.*;
public abstract class Test {
abstract TregexMatcher matcher(Tree root, Tree tree, Map<String, Tree> namesToNodes, VariableStrings variableStrings);
public TregexMatcher matcher(Tree t) {
return matcher(t, t, new HashMap<String, Tree>(), new VariableStrings());
}
public static void main(String[] args) throws ParseException, IOException {
String encoding = "UTF-8";
TregexPattern p = TregexPattern.compile("NP < NN & <<DT"); //"/^MWV/" or "NP < (NP=np < NNS)"
TreeReader r = new PennTreeReader(new StringReader("(VP (VP (VBZ Try) (NP (NP (DT this) (NN wine)) (CC and) (NP (DT these) (NNS snails)))) (PUNCT .))"), new LabeledScoredTreeFactory(new StringLabelFactory()));
Tree t = r.readTree();
treebank = new MemoryTreebank();
treebank.add(t);
TRegexTreeVisitor vis = new TRegexTreeVisitor(p, encoding);
**treebank.apply(vis); //line 26**
if (TRegexTreeVisitor.printMatches) {
System.out.println("There were " + vis.numMatches() + " matches in total.");
}
}
private static Treebank treebank; // used by main method, must be accessible
static class TRegexTreeVisitor implements TreeVisitor {
private static boolean printNumMatchesToStdOut = false;
static boolean printNonMatchingTrees = false;
static boolean printSubtreeCode = false;
static boolean printTree = false;
static boolean printWholeTree = false;
static boolean printMatches = true;
static boolean printFilename = false;
static boolean oneMatchPerRootNode = false;
static boolean reportTreeNumbers = false;
static TreePrint tp;
PrintWriter pw;
int treeNumber = 0;
TregexPattern p;
//String[] handles;
int numMatches;
TRegexTreeVisitor(TregexPattern p, String encoding) {
this.p = p;
//this.handles = handles;
try {
pw = new PrintWriter(new OutputStreamWriter(System.out, encoding), true);
} catch (UnsupportedEncodingException e) {
System.err.println("Error -- encoding " + encoding + " is unsupported. Using ASCII print writer instead.");
pw = new PrintWriter(System.out, true);
}
// tp.setPrintWriter(pw);
}
public void visitTree(Tree t) {
treeNumber++;
if (printTree) {
pw.print(treeNumber + ":");
pw.println("Next tree read:");
tp.printTree(t, pw);
}
TregexMatcher match = p.matcher(t);
if (printNonMatchingTrees) {
if (match.find()) {
numMatches++;
} else {
tp.printTree(t, pw);
}
return;
}
Tree lastMatchingRootNode = null;
while (match.find()) {
if (oneMatchPerRootNode) {
if (lastMatchingRootNode == match.getMatch()) {
continue;
} else {
lastMatchingRootNode = match.getMatch();
}
}
numMatches++;
if (printFilename && treebank instanceof DiskTreebank) {
DiskTreebank dtb = (DiskTreebank) treebank;
pw.print("# ");
pw.println(dtb.getCurrentFile());
}
if (printSubtreeCode) {
pw.println(treeNumber + ":" + match.getMatch().nodeNumber(t));
}
if (printMatches) {
if (reportTreeNumbers) {
pw.print(treeNumber + ": ");
}
if (printTree) {
pw.println("Found a full match:");
}
if (printWholeTree) {
tp.printTree(t, pw);
} else {
**tp.printTree(match.getMatch(), pw); //line 108**
}
// pw.println(); // TreePrint already puts a blank line in
} // end if (printMatches)
} // end while match.find()
} // end visitTree
public int numMatches() {
return numMatches;
}
} // end class TRegexTreeVisitor
}
但它会出现以下错误:
Exception in thread "main" java.lang.NullPointerException
at edu.stanford.nlp.trees.tregex.Test$TRegexTreeVisitor.visitTree(Test.java:108)
at edu.stanford.nlp.trees.MemoryTreebank.apply(MemoryTreebank.java:376)
at edu.stanford.nlp.trees.tregex.Test.main(Test.java:26)
Java Result: 1
任何修改或想法?
答案 0 :(得分:1)
NullPointerException通常是软件中错误的指示器。
我过去也有同样的任务。使用依赖解析器解析句子。 我决定将结果解析树放在XML(DOM)中并对其执行XPath查询。
为了提高性能,您不需要将xml放在String中,只需将所有XML结构作为DOM保留在内存中(例如http://www.ibm.com/developerworks/xml/library/x-domjava/)。
使用XPath查询树状数据结构给了我以下好处: