Question

我正在尝试开始使用Stanford CoreNLP，甚至无法从这里获得第一个简单示例。

https://stanfordnlp.github.io/CoreNLP/api.html

这是我的代码：

package stanford.corenlp;

import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.List;
import java.util.Map;
import java.util.Properties;

import com.google.common.io.Files;

import edu.stanford.nlp.dcoref.CorefChain;
import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
import edu.stanford.nlp.util.CoreMap;
import java.util.logging.Level;
import java.util.logging.Logger;

    private void test2() {
        // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        // read some text in the text variable
        String text = "Now is the time for all good men to come to the aid of their country.";

        // create an empty Annotation just with the given text
        Annotation document = new Annotation(text);

        // run all Annotators on this text
        pipeline.annotate(document);

    }

  public static void main(String[] args) throws IOException {
      StanfordNLP nlp = new StanfordNLP();
      nlp.test2();
  }

}

这是stacktrace：

Adding annotator tokenize
No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator pos
Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:791)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:312)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:265)
    at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:85)
    at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:73)
    at edu.stanford.nlp.pipeline.AnnotatorImplementations.posTagger(AnnotatorImplementations.java:55)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$42(StanfordCoreNLP.java:496)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getDefaultAnnotatorPool$65(StanfordCoreNLP.java:533)
    at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:118)
    at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:146)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:447)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:150)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:146)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:133)
    at stanford.corenlp.StanfordNLP.test2(StanfordNLP.java:93)
    at stanford.corenlp.StanfordNLP.main(StanfordNLP.java:108)
Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as class path, filename or URL
    at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:480)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:789)
    ... 16 more
C:\Users\Greg\AppData\Local\NetBeans\Cache\8.2\executor-snippets\run.xml:53: Java returned: 1
BUILD FAILED (total time: 0 seconds)

我错过了什么？

Answer 1

首先，您需要添加到类路径stanford-corenlp-3.8.0.jar。这使得NetBeans中的红色错误标记消失了。但是您还需要将stanford-corenlp-3.8.0-models.jar添加到类路径中以防止我记录的错误。将它所在的文件夹添加到类路径中不起作用。这样的细节永远不会遗漏在初学者的文档中！

现在，如果您继续使用该示例并添加新内容，则会发生更多错误。例如，代码将如下所示：

package stanford.corenlp;

    import java.io.File;
    import java.io.IOException;
    import java.nio.charset.Charset;
    import java.util.List;
    import java.util.Map;
    import java.util.Properties;

    import com.google.common.io.Files;

    import edu.stanford.nlp.dcoref.CorefChain;
    import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;
    import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;
    import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;
    import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
    import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
    import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
    import edu.stanford.nlp.ling.CoreLabel;
    import edu.stanford.nlp.pipeline.Annotation;
    import edu.stanford.nlp.pipeline.StanfordCoreNLP;
    import edu.stanford.nlp.semgraph.SemanticGraph;
    import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
    import edu.stanford.nlp.trees.Tree;
    import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
    import edu.stanford.nlp.util.CoreMap;
    import edu.stanford.nlp.util.PropertiesUtils;
    import java.util.logging.Level;
    import java.util.logging.Logger;

        private void test2() {
            // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
            Properties props = new Properties();
            props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
            StanfordCoreNLP pipeline = new StanfordCoreNLP(
            PropertiesUtils.asProperties(
                "annotators", "tokenize,ssplit,pos,lemma,parse,natlog",
                "ssplit.isOneSentence", "true",
                "parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz",
                "tokenize.language", "en"));

            // read some text in the text variable
            String text = "Now is the time for all good men to come to the aid of their country.";

            // create an empty Annotation just with the given text
            Annotation document = new Annotation(text);

            // run all Annotators on this text
            pipeline.annotate(document);

            // these are all the sentences in this document
            // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
            List<CoreMap> sentences = document.get(SentencesAnnotation.class);

            for (CoreMap sentence: sentences) {
                // traversing the words in the current sentence
                // a CoreLabel is a CoreMap with additional token-specific methods
                for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
                    // this is the text of the token
                    String word = token.get(TextAnnotation.class);
                    // this is the POS tag of the token
                    String pos = token.get(PartOfSpeechAnnotation.class);
                    // this is the NER label of the token
                    String ne = token.get(NamedEntityTagAnnotation.class);

                    System.out.println("word="+word +", pos="+pos +", ne="+ne);
                }

                // this is the parse tree of the current sentence
                Tree tree = sentence.get(TreeAnnotation.class);

                // this is the Stanford dependency graph of the current sentence
                SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
            }

            // This is the coreference link graph
            // Each chain stores a set of mentions that link to each other,
            // along with a method for getting the most representative mention
            // Both sentence and token offsets start at 1!
            Map<Integer, CorefChain> graph = 
                document.get(CorefChainAnnotation.class);
        }

      public static void main(String[] args) throws IOException {
          StanfordNLP nlp = new StanfordNLP();
          nlp.test2();
      }

    }

堆栈跟踪变为：

run:
Adding annotator tokenize
Adding annotator ssplit
Adding annotator pos
Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.6 sec].
Adding annotator lemma
Adding annotator parse
Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: java.io.IOException: Unable to open "edu/stanford/nlp/models/srparser/englishSR.ser.gz" as class path, filename or URL
    at edu.stanford.nlp.parser.common.ParserGrammar.loadModel(ParserGrammar.java:187)
    at edu.stanford.nlp.pipeline.ParserAnnotator.loadModel(ParserAnnotator.java:219)
    at edu.stanford.nlp.pipeline.ParserAnnotator.<init>(ParserAnnotator.java:121)
    at edu.stanford.nlp.pipeline.AnnotatorImplementations.parse(AnnotatorImplementations.java:115)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$50(StanfordCoreNLP.java:504)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getDefaultAnnotatorPool$65(StanfordCoreNLP.java:533)
    at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:118)
    at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:146)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:447)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:150)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:146)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:133)
    at stanford.corenlp.StanfordNLP.test2(StanfordNLP.java:95)
    at stanford.corenlp.StanfordNLP.main(StanfordNLP.java:145)
Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/srparser/englishSR.ser.gz" as class path, filename or URL
    at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:480)
    at edu.stanford.nlp.io.IOUtils.readObjectFromURLOrClasspathOrFileSystem(IOUtils.java:309)
    at edu.stanford.nlp.parser.common.ParserGrammar.loadModel(ParserGrammar.java:184)
    ... 14 more
C:\Users\Greg\AppData\Local\NetBeans\Cache\8.2\executor-snippets\run.xml:53: Java returned: 1
BUILD FAILED (total time: 1 second)

我终于通过下载并添加到类路径stanford-english-corenlp-2017-06-09-models.jar来完成这一点，你可以从这里获得“英文”下载链接：

https://stanfordnlp.github.io/CoreNLP/download.html

你需要这样做，尽管下载页面上的消息说英语所需的一切都已经在corenlp下载中提供了！

Answer 2

[2019-12-31]供后代/参考；注意：Linux终端。

下载CoreNLP 3.9.2 | stanford-corenlp-full-2018-10-05.zip，https://stanfordnlp.github.io/CoreNLP/download.html]并将其提取出来。

pwd; ls -l
  /mnt/Vancouver/apps/CoreNLP/src-local/zzz
  -rw-r--r-- 1 victoria victoria 393239982 Dec 31 14:13 stanford-corenlp-full-2018-10-05.zip

unzip stanford-corenlp-full-2018-10-05.zip
  # ...

ls -l
  drwxrwxr-x 5 victoria victoria      4096 Oct  8  2018 stanford-corenlp-full-2018-10-05
  -rw-r--r-- 1 victoria victoria 393239982 Dec 31 14:13 stanford-corenlp-full-2018-10-05.zip

保存“ BasicPipelineExample.java”代码

在名为BasicPipelineExample.java的文件中：

/mnt/Vancouver/apps/CoreNLP/src-local/zzz/BasicPipelineExample.java

编译

pwd     ## "sanity check"
  /mnt/Vancouver/apps/CoreNLP/src-local/zzz/

javac -cp stanford-corenlp-3.9.2.jar  BasicPipelineExample.java -Xdiags:verbose

提供Java类文件BasicPipelineExample.class，并从该目录运行它，

java -cp .:* BasicPipelineExample

附录

上面的代码描述了在Java环境中对CoreNLP的访问，如下所述：https://stanfordnlp.github.io/CoreNLP/api.html#quickstart-with-convenience-wrappers

对于那些更倾向于（包括我自己）的人，斯坦福大学在Python环境中提供了基本相同的功能，如此处所述：https://stanfordnlp.github.io/stanfordnlp/corenlp_client.html

例如，

import stanfordnlp
from stanfordnlp.server import CoreNLPClient
# JSON output [default]:
# client = CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'], timeout=30000, memory='16G')
# Plain-text ourput (much more compact):
client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', output_format='text',  timeout=30000, memory='16G')
text = 'Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein.'
# This auto-starts the client() instance:
ann = client.annotate(text)
  # ....
sentence = ann.sentence[0]
print(sentence)
  # ... copious output ...
print(ann)
  # ... more succinct ...

注意：如果您使用output_format='text'参数，则不能再这样做：

sentence = ann.sentence[0]
  Traceback (most recent call last):
    File "<console>", line 1, in <module>
  AttributeError: 'str' object has no attribute 'sentence'

因此，您不能再做

dependency_parse = sentence.basicDependencies
print(dependency_parse)

使用stanfordnlp包，您还可以按照以下说明建立管道：https://stanfordnlp.github.io/stanfordnlp/

例如，

import stanfordnlp
stanfordnlp.download('en')
nlp = stanfordnlp.Pipeline())
text = 'Bananas are an excellent source of potassium.'
text_nlp = nlp(text)
text_nlp.sentences[0].print_dependencies()`

最后-尽管我发现功能有限（请参阅斯坦福大学编写的CoreNLP库），但可以通过访问spaCy中的CoreNLP获得类似的结果：https://github.com/explosion/spacy-stanfordnlp

import stanfordnlp
from spacy_stanfordnlp import StanfordNLPLanguage
snlp = stanfordnlp.Pipeline(lang="en")
nlp = StanfordNLPLanguage(snlp)
doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.dep_)
  # ...

斯坦福CoreNLP BasicPipelineExample不起作用

2 个答案: