加载第二个词典后,Java Stanford NLP:ArrayIndexOutOfBounds

时间:2009-12-08 00:47:34

标签: java nlp stanford-nlp

我正在使用斯坦福自然语言处理工具包。我一直试图用Lexicon的{​​{1}}方法找到拼写错误,但它会产生很多误报。所以我以为我会加载第二个词典,并检查一下。但是,这会导致问题。

isKnown

生成以下失败跟踪:

private static LexicalizedParser lp = new LexicalizedParser(Constants.stdLexFile);
private static LexicalizedParser wsjLexParse = new LexicalizedParser(Constants.wsjLexFile);

    static {
        lp.setOptionFlags(Constants.lexOptionFlags);        
        wsjLexParse.setOptionFlags(Constants.lexOptionFlags);       
    }

public ParseTree(String input) throws IllegalArgumentException, IllegalAccessException, InvocationTargetException {
    initialInput = input;
    DocumentPreprocessor process = new DocumentPreprocessor();
    sentences = process.getSentencesFromText(new StringReader(input));

    for (List<? extends HasWord> sent : sentences) {
        if(lp.parse(sent)) { // line 65
            forest.add(lp.getBestParse()); //non determinism?
        }
    }

    partsOfSpeech = pos();
    runAnalysis();
}

如果我注释掉这一行:(以及对wsjLexParse的其他引用)

java.lang.ArrayIndexOutOfBoundsException: 45547
    at edu.stanford.nlp.parser.lexparser.BaseLexicon.initRulesWithWord(BaseLexicon.java:300)
    at edu.stanford.nlp.parser.lexparser.BaseLexicon.isKnown(BaseLexicon.java:160)
    at edu.stanford.nlp.parser.lexparser.BaseLexicon.ruleIteratorByWord(BaseLexicon.java:212)
    at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.initializeChart(ExhaustivePCFGParser.java:1299)
    at edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.parse(ExhaustivePCFGParser.java:388)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.parse(LexicalizedParser.java:234)
    at nth.compling.ParseTree.<init>(ParseTree.java:65)
    at nth.compling.ParseTreeTest.constructor(ParseTreeTest.java:33)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.junit.internal.runners.BeforeAndAfterRunner.invokeMethod(BeforeAndAfterRunner.java:74)
    at org.junit.internal.runners.BeforeAndAfterRunner.runBefores(BeforeAndAfterRunner.java:50)
    at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(BeforeAndAfterRunner.java:33)
    at org.junit.internal.runners.TestClassRunner.run(TestClassRunner.java:52)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)

然后一切正常。我在这里做错了什么?

1 个答案:

答案 0 :(得分:0)

看起来像斯坦福图书馆的一个错误。你应该向他们报告。

第二个词典只在你加载(而不是另一个词典)时是否有效? 以不同顺序加载两个lexica时是否会出现相同的错误?