netbeans中的StanfordCoreNLP错误:加载标记模型时出现不可恢复的错误

时间:2014-06-17 13:21:38

标签: java netbeans stanford-nlp

我正在尝试使用StanfordCoreNLP来区分句子中的单数和复数名词。 作为一个开始, 我正在使用http://nlp.stanford.edu/software/corenlp.shtml中的代码。 在netbeans 8.0中,我打开了一个新的java项目。我已经下载了stanford-corenlp-full-2014-06-16并将jar文件(包括模型jar)添加到我的项目中:

enter image description here

代码类SingularORPlural:

    import java.util.LinkedList;
import java.util.List;
import java.util.Properties;

import edu.stanford.nlp.ling.CoreAnnotations.LemmaAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;


/**
 *
 * @author ha
 */
public class SingularORPlural {

    protected StanfordCoreNLP pipeline;

    public SingularORPlural() {
        // Create StanfordCoreNLP object properties, with POS tagging
        // (required for lemmatization), and lemmatization
        Properties props;
        props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma");

        /*
         * This is a pipeline that takes in a string and returns various analyzed linguistic forms. 
         * The String is tokenized via a tokenizer (such as PTBTokenizerAnnotator), 
         * and then other sequence model style annotation can be used to add things like lemmas, 
         * POS tags, and named entities. These are returned as a list of CoreLabels. 
         * Other analysis components build and store parse trees, dependency graphs, etc. 
         * 
         * This class is designed to apply multiple Annotators to an Annotation. 
         * The idea is that you first build up the pipeline by adding Annotators, 
         * and then you take the objects you wish to annotate and pass them in and 
         * get in return a fully annotated object.
         * 
         *  StanfordCoreNLP loads a lot of models, so you probably
         *  only want to do this once per execution
         */
        this.pipeline = new StanfordCoreNLP(props);
    }

    public List<String> lemmatize(String documentText)
    {
        List<String> lemmas = new LinkedList<String>();
        // Create an empty Annotation just with the given text
        Annotation document = new Annotation(documentText);
        // run all Annotators on this text
        this.pipeline.annotate(document);
        // Iterate over all of the sentences found
        List<CoreMap> sentences = document.get(SentencesAnnotation.class);
        for(CoreMap sentence: sentences) {
            // Iterate over all tokens in a sentence
            for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
                // Retrieve and add the lemma for each word into the
                // list of lemmas
                lemmas.add(token.get(LemmaAnnotation.class));
            }
        }
        return lemmas;
    }


}

然后在主要:

System.out.println("Starting Stanford Lemmatizer");
        String text = "How could you be seeing into my eyes like open doors? \n";
        SingularORPlural slem = new SingularORPlural();
        System.out.println( slem.lemmatize(text) );

我收到了这个错误:

    run:
Starting Stanford Lemmatizer
Adding annotator tokenize
Adding annotator ssplit

Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... Exception in thread "main" java.lang.RuntimeException: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
    at edu.stanford.nlp.pipeline.StanfordCoreNLP$4.create(StanfordCoreNLP.java:558)
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:267)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:129)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:125)
    at stanfordposcode.SingularORPlural.<init>(SingularORPlural.java:51)
    at stanfordposcode.StanfordPOSCode.main(StanfordPOSCode.java:74)
Caused by: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:857)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:755)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:289)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:253)
    at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:97)
    at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:77)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP$4.create(StanfordCoreNLP.java:556)
    ... 6 more
Caused by: java.io.InvalidClassException: edu.stanford.nlp.tagger.maxent.ExtractorDistsim; local class incompatible: stream classdesc serialVersionUID = 2, local class serialVersionUID = 1
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)   at java.io.ObjectStreamClass.initNonProxy(  at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
ObjectStreamClass.java:621)
ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readExtractors(MaxentTagger.java:582)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:808)
    ... 12 more
Java Result: 1
BUILD SUCCESSFUL (total time: 3 seconds)

如何解决此错误。

1 个答案:

答案 0 :(得分:6)

我遇到了同样的错误,所以

失败的原因是您使用旧的标记文件(“english-left3words-distsim.tagger”)与StanfordCoreNLP的src /二进制/字节代码的较新版本不兼容。所有都应该是一致的/兼容的 - 来自同一个盒子/构建。

简单的答案是:确保你选择正确的标记文件。

这些简单的步骤将有所帮助:

  1. 下载此:http://nlp.stanford.edu/software/stanford-corenlp-full-2014-06-16.zip
  2. 将此添加到您的pom.xml(如果您使用maven)
  3. <dependencies>
    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.4</version>
    </dependency>
    <dependency>
        <groupId>edu.stanford.nlp</groupId>
        <artifactId>stanford-corenlp</artifactId>
        <version>3.4</version>
        <classifier>models</classifier>
    </dependency>
    </dependencies>
    

    然后确保它有效:

    public class TagText {
        public static void main(String[] args) throws IOException,
                ClassNotFoundException {
    
            // Initialize the tagger
            final MaxentTagger tagger = new MaxentTagger("taggers/english-left3words-distsim.tagger");
    
            // The sample string
            final String sample1 = "This is a sample text.";
            final String sample2 = "The sailor dogs the hatch.";
    
            // The tagged string
            final String tagged1 = tagger.tagString(sample1);
            final String tagged2 = tagger.tagString(sample2);
    
            // Output the result
            System.out.println(tagged1);
            System.out.println(tagged2);
        }
    }