我正在尝试开始使用Stanford CoreNLP,甚至无法从这里获得第一个简单示例。
这是我的代码:
package stanford.corenlp;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import com.google.common.io.Files;
import edu.stanford.nlp.dcoref.CorefChain;
import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
import edu.stanford.nlp.util.CoreMap;
import java.util.logging.Level;
import java.util.logging.Logger;
private void test2() {
// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
String text = "Now is the time for all good men to come to the aid of their country.";
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
}
public static void main(String[] args) throws IOException {
StanfordNLP nlp = new StanfordNLP();
nlp.test2();
}
}
这是stacktrace:
Adding annotator tokenize
No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator pos
Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:791)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:312)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:265)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:85)
at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:73)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.posTagger(AnnotatorImplementations.java:55)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$42(StanfordCoreNLP.java:496)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getDefaultAnnotatorPool$65(StanfordCoreNLP.java:533)
at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:118)
at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:146)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:447)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:150)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:146)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:133)
at stanford.corenlp.StanfordNLP.test2(StanfordNLP.java:93)
at stanford.corenlp.StanfordNLP.main(StanfordNLP.java:108)
Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:480)
at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:789)
... 16 more
C:\Users\Greg\AppData\Local\NetBeans\Cache\8.2\executor-snippets\run.xml:53: Java returned: 1
BUILD FAILED (total time: 0 seconds)
我错过了什么?
答案 0 :(得分:3)
首先,您需要添加到类路径stanford-corenlp-3.8.0.jar。这使得NetBeans中的红色错误标记消失了。但是您还需要将stanford-corenlp-3.8.0-models.jar添加到类路径中以防止我记录的错误。将它所在的文件夹添加到类路径中不起作用。这样的细节永远不会遗漏在初学者的文档中!
现在,如果您继续使用该示例并添加新内容,则会发生更多错误。例如,代码将如下所示:
package stanford.corenlp;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import com.google.common.io.Files;
import edu.stanford.nlp.dcoref.CorefChain;
import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
import edu.stanford.nlp.util.CoreMap;
import edu.stanford.nlp.util.PropertiesUtils;
import java.util.logging.Level;
import java.util.logging.Logger;
private void test2() {
// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(
PropertiesUtils.asProperties(
"annotators", "tokenize,ssplit,pos,lemma,parse,natlog",
"ssplit.isOneSentence", "true",
"parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz",
"tokenize.language", "en"));
// read some text in the text variable
String text = "Now is the time for all good men to come to the aid of their country.";
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for (CoreMap sentence: sentences) {
// traversing the words in the current sentence
// a CoreLabel is a CoreMap with additional token-specific methods
for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
// this is the text of the token
String word = token.get(TextAnnotation.class);
// this is the POS tag of the token
String pos = token.get(PartOfSpeechAnnotation.class);
// this is the NER label of the token
String ne = token.get(NamedEntityTagAnnotation.class);
System.out.println("word="+word +", pos="+pos +", ne="+ne);
}
// this is the parse tree of the current sentence
Tree tree = sentence.get(TreeAnnotation.class);
// this is the Stanford dependency graph of the current sentence
SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
}
// This is the coreference link graph
// Each chain stores a set of mentions that link to each other,
// along with a method for getting the most representative mention
// Both sentence and token offsets start at 1!
Map<Integer, CorefChain> graph =
document.get(CorefChainAnnotation.class);
}
public static void main(String[] args) throws IOException {
StanfordNLP nlp = new StanfordNLP();
nlp.test2();
}
}
堆栈跟踪变为:
run:
Adding annotator tokenize
Adding annotator ssplit
Adding annotator pos
Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.6 sec].
Adding annotator lemma
Adding annotator parse
Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: java.io.IOException: Unable to open "edu/stanford/nlp/models/srparser/englishSR.ser.gz" as class path, filename or URL
at edu.stanford.nlp.parser.common.ParserGrammar.loadModel(ParserGrammar.java:187)
at edu.stanford.nlp.pipeline.ParserAnnotator.loadModel(ParserAnnotator.java:219)
at edu.stanford.nlp.pipeline.ParserAnnotator.<init>(ParserAnnotator.java:121)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.parse(AnnotatorImplementations.java:115)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$50(StanfordCoreNLP.java:504)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getDefaultAnnotatorPool$65(StanfordCoreNLP.java:533)
at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:118)
at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:146)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:447)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:150)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:146)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:133)
at stanford.corenlp.StanfordNLP.test2(StanfordNLP.java:95)
at stanford.corenlp.StanfordNLP.main(StanfordNLP.java:145)
Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/srparser/englishSR.ser.gz" as class path, filename or URL
at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:480)
at edu.stanford.nlp.io.IOUtils.readObjectFromURLOrClasspathOrFileSystem(IOUtils.java:309)
at edu.stanford.nlp.parser.common.ParserGrammar.loadModel(ParserGrammar.java:184)
... 14 more
C:\Users\Greg\AppData\Local\NetBeans\Cache\8.2\executor-snippets\run.xml:53: Java returned: 1
BUILD FAILED (total time: 1 second)
我终于通过下载并添加到类路径stanford-english-corenlp-2017-06-09-models.jar来完成这一点,你可以从这里获得“英文”下载链接:
你需要这样做,尽管下载页面上的消息说英语所需的一切都已经在corenlp下载中提供了!
答案 1 :(得分:0)
[2019-12-31]供后代/参考;注意:Linux终端。
下载CoreNLP 3.9.2
| stanford-corenlp-full-2018-10-05.zip
,https://stanfordnlp.github.io/CoreNLP/download.html]并将其提取出来。
pwd; ls -l
/mnt/Vancouver/apps/CoreNLP/src-local/zzz
-rw-r--r-- 1 victoria victoria 393239982 Dec 31 14:13 stanford-corenlp-full-2018-10-05.zip
unzip stanford-corenlp-full-2018-10-05.zip
# ...
ls -l
drwxrwxr-x 5 victoria victoria 4096 Oct 8 2018 stanford-corenlp-full-2018-10-05
-rw-r--r-- 1 victoria victoria 393239982 Dec 31 14:13 stanford-corenlp-full-2018-10-05.zip
保存“ BasicPipelineExample.java”代码
在名为BasicPipelineExample.java
的文件中:
/mnt/Vancouver/apps/CoreNLP/src-local/zzz/BasicPipelineExample.java
编译
pwd ## "sanity check"
/mnt/Vancouver/apps/CoreNLP/src-local/zzz/
javac -cp stanford-corenlp-3.9.2.jar BasicPipelineExample.java -Xdiags:verbose
提供Java类文件BasicPipelineExample.class
,并从该目录运行它,
java -cp .:* BasicPipelineExample
附录
上面的代码描述了在Java环境中对CoreNLP的访问,如下所述:https://stanfordnlp.github.io/CoreNLP/api.html#quickstart-with-convenience-wrappers
对于那些更倾向于(包括我自己)的人,斯坦福大学在Python环境中提供了基本相同的功能,如此处所述:https://stanfordnlp.github.io/stanfordnlp/corenlp_client.html
例如,
import stanfordnlp
from stanfordnlp.server import CoreNLPClient
# JSON output [default]:
# client = CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'], timeout=30000, memory='16G')
# Plain-text ourput (much more compact):
client = CoreNLPClient(annotators='tokenize, ssplit, pos, lemma, ner, parse, depparse, coref', output_format='text', timeout=30000, memory='16G')
text = 'Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein.'
# This auto-starts the client() instance:
ann = client.annotate(text)
# ....
sentence = ann.sentence[0]
print(sentence)
# ... copious output ...
print(ann)
# ... more succinct ...
注意:如果您使用output_format='text'
参数,则不能再这样做:
sentence = ann.sentence[0]
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'str' object has no attribute 'sentence'
因此,您不能再做
dependency_parse = sentence.basicDependencies
print(dependency_parse)
使用stanfordnlp
包,您还可以按照以下说明建立管道:https://stanfordnlp.github.io/stanfordnlp/
例如,
import stanfordnlp
stanfordnlp.download('en')
nlp = stanfordnlp.Pipeline())
text = 'Bananas are an excellent source of potassium.'
text_nlp = nlp(text)
text_nlp.sentences[0].print_dependencies()`
最后-尽管我发现功能有限(请参阅斯坦福大学编写的CoreNLP库),但可以通过访问spaCy中的CoreNLP获得类似的结果:https://github.com/explosion/spacy-stanfordnlp
import stanfordnlp
from spacy_stanfordnlp import StanfordNLPLanguage
snlp = stanfordnlp.Pipeline(lang="en")
nlp = StanfordNLPLanguage(snlp)
doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
for token in doc:
print(token.text, token.lemma_, token.pos_, token.dep_)
# ...