在Eclipse中遵循Stanford CoreNLP教程时出错

时间:2018-03-06 11:07:11

标签: java machine-learning nlp artificial-intelligence

问题:

当我从eclipse中的Stanford CoreNLP教程中粘贴此示例代码时,我遇到了一堆错误 - 这段代码来自初学者的Stanford CoreNLP教程(`https://stanfordnlp.github.io/CoreNLP/api.html)。我不确定是什么问题 - 我导入了其他教程中提到的外部JAR文件,但我仍然遇到错误。 :

import edu.stanford.nlp.coref.data.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.ie.util.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.*;
import java.util.*;
public class BasicPipelineExample{

    public static String text = "Joe Smith was born in California. " +
              "In 2017, he went to Paris, France in the summer. " +
              "His flight left at 3:00pm on July 10th, 2017. " +
              "After eating some escargot for the first time, Joe said, 

\"That was delicious!\" " +
                  "He sent a postcard to his sister Jane Smith. " +
                  "After hearing about Joe's trip, Jane decided she might go to France one day.";

    public static void main(String[] DEEPANSHA) throws InterruptedException
    {
            //set up pipeline properties
           Properties props = new Properties();

           // set the list of annotators to run
            props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");

            //set a property for an annotator, in this case the coref annotator is being set to use the neural algorithm
            props.setProperty("coref.algorithm", "neural");

            // build pipeline
            StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

            // create a document object
            CoreDocument document = new CoreDocument(text);

            // annnotate the document
            pipeline.annotate(document);

            // text of the first sentence
            String sentenceText = document.sentences().get(0).text();
            System.out.println("Example: sentence");
            System.out.println(sentenceText);
            System.out.println();

    }
}

显示的错误:

Adding annotator tokenize
No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator pos
Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [2.2 sec].
Adding annotator lemma
Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [5.6 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.3 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [3.1 sec].
TokensRegexNERAnnotator ner.fine.regexner: Read 580641 unique entries out of 581790 from edu/stanford/nlp/models/kbp/regexner_caseless.tab, 0 TokensRegex patterns.
TokensRegexNERAnnotator ner.fine.regexner: Read 4857 unique entries out of 4868 from edu/stanford/nlp/models/kbp/regexner_cased.tab, 0 TokensRegex patterns.
TokensRegexNERAnnotator ner.fine.regexner: Read 585498 unique entries from 2 files
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at edu.stanford.nlp.ling.tokensregex.parser.TokenSequenceParser.<init>(TokenSequenceParser.java:3446)
    at edu.stanford.nlp.ling.tokensregex.TokenSequencePattern.getNewEnv(TokenSequencePattern.java:158)
    at edu.stanford.nlp.pipeline.TokensRegexNERAnnotator.createPatternMatcher(TokensRegexNERAnnotator.java:343)
    at edu.stanford.nlp.pipeline.TokensRegexNERAnnotator.<init>(TokensRegexNERAnnotator.java:295)
    at edu.stanford.nlp.pipeline.NERCombinerAnnotator.setUpFineGrainedNER(NERCombinerAnnotator.java:209)
    at edu.stanford.nlp.pipeline.NERCombinerAnnotator.<init>(NERCombinerAnnotator.java:152)
    at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:68)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$44(StanfordCoreNLP.java:546)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP$$Lambda$14/501263526.apply(Unknown Source)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$69(StanfordCoreNLP.java:625)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP$$Lambda$36/277630005.get(Unknown Source)
    at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126)
    at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:495)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:201)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:194)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:181)
    at BasicPipelineExample.main(BasicPipelineExample.java:29)

1 个答案:

答案 0 :(得分:0)

将虚拟机堆大小增加到3或4GB,以获取完整的管道(pos,depparse,ner)。 在那之后它应该工作。 https://wiki.eclipse.org/FAQ_How_do_I_increase_the_heap_size_available_to_Eclipse%3F