创建StanfordCoreNLP对象时出错

时间:2014-03-05 18:25:46

标签: java maven jar nlp stanford-nlp

我已从http://nlp.stanford.edu/software/corenlp.shtml#Download下载并安装了所需的jar文件。

我已经包含了五个jar文件

Satnford-postagger.jar

斯坦福-psotagger-3.3.1.jar

斯坦福-psotagger-3.3.1.jar-javadoc.jar

斯坦福-psotagger-3.3.1.jar-src.jar

斯坦福-corenlp-3.3.1.jar

,代码是

public class lemmafirst {

    protected StanfordCoreNLP pipeline;

    public lemmafirst() {
        // Create StanfordCoreNLP object properties, with POS tagging
        // (required for lemmatization), and lemmatization
        Properties props;
        props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma");

        /*
         * This is a pipeline that takes in a string and returns various analyzed linguistic forms. 
         * The String is tokenized via a tokenizer (such as PTBTokenizerAnnotator), 
         * and then other sequence model style annotation can be used to add things like lemmas, 
         * POS tags, and named entities. These are returned as a list of CoreLabels. 
         * Other analysis components build and store parse trees, dependency graphs, etc. 
         * 
         * This class is designed to apply multiple Annotators to an Annotation. 
         * The idea is that you first build up the pipeline by adding Annotators, 
         * and then you take the objects you wish to annotate and pass them in and 
         * get in return a fully annotated object.
         * 
         *  StanfordCoreNLP loads a lot of models, so you probably
         *  only want to do this once per execution
         */
        ***this.pipeline = new StanfordCoreNLP(props);***
}

我的问题在于创建一条pipline。

我得到的错误是:

Exception in thread "main" java.lang.RuntimeException: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
    at edu.stanford.nlp.pipeline.StanfordCoreNLP$4.create(StanfordCoreNLP.java:563)
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:81)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:262)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:129)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:125)
    at lemmafirst.<init>(lemmafirst.java:39)
    at lemmafirst.main(lemmafirst.java:83)
Caused by: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:758)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:289)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:253)
    at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:88)
    at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:76)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP$4.create(StanfordCoreNLP.java:561)
    ... 6 more
Caused by: java.io.IOException: Unable to resolve "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as either class path, filename or URL
    at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:434)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:753)
    ... 11 more

任何人都可以请更正错误吗?谢谢

3 个答案:

答案 0 :(得分:14)

抛出的异常是由于缺少pos模型。这是因为有可下载的版本有和没有模型文件。

你要么添加 斯坦福-postagger-的 -3.3.1.jar 可在以下页面找到(stanford-postagger-full-2014-01-04.zip): http://nlp.stanford.edu/software/tagger.shtml

或者你对整个CoreNLP包(stanford-corenlp- 完整 .... jar)做同样的事情: http://nlp.stanford.edu/software/corenlp.shtml (然后你也可以删除所有postagger依赖项,它们包含在CoreNLP中)

如果您只想添加模型文件,请查看Maven Central并下载“stanford-corenlp-3.3.1-models.jar”。

答案 1 :(得分:7)

添加这些模型文件的一种更简单的方法是在pom.xml中添加以下依赖项,让maven为您管理:

<dependency>
  <groupId>edu.stanford.nlp</groupId>
  <artifactId>stanford-corenlp</artifactId>
  <version>3.6.0</version>
</dependency>
<dependency>
  <groupId>edu.stanford.nlp</groupId>
  <artifactId>stanford-corenlp</artifactId>
  <version>3.6.0</version>
  <classifier>models</classifier> <!--  will get the dependent model jars -->
</dependency>

答案 2 :(得分:0)

如果有人在寻找gradle依赖项,请在依赖项下添加以下内容。

 compile group: 'edu.stanford.nlp', name: 'stanford-corenlp', version: '3.9.1'
 compile group: 'edu.stanford.nlp', name: 'stanford-corenlp', version: '3.9.1', classifier: 'models'