我想使用Stanford NLP coreference sinle。多数民众赞成意味着我完成了标记器和句子,并且所有的工作都需要用于coref。我构建了Document anotation并完成了所有的anotation。 但是当我们想要使用coref它有错误,因为我不使用StanfordcoreNLP类 这是我的代码:
edu.stanford.nlp.pipeline.Annotation document=new edu.stanford.nlp.pipeline.Annotation(doc.toString());
Properties props = new Properties();
ArrayList <edu.stanford.nlp.ling.CoreLabel> tokenAnnotate=new ArrayList<>();
//document.set(edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation.class,doc.toString());
int countToken=0;
int countSentence=0;
for(CoreMap sentence: sentences) {
ArrayList <edu.stanford.nlp.ling.CoreLabel> tokenAnnotateCoreMap=new ArrayList<>();
// traversing the words in the current sentence
// a CoreLabel is a CoreMap with additional token-specific methods
edu.stanford.nlp.util.CoreMap stanfordCorMap=new edu.stanford.nlp.pipeline.Annotation(sentence.toString());
int countFirstToken=countToken;
for (CoreLabel token: sentence.get(com.mobin.tp.textAnnotator.common.dto.CoreAnnotations.TokensAnnotation.class)) {
// this is the text of the token
countToken++;
edu.stanford.nlp.ling.CoreLabel coreLabel=mobinStanfordConverter.mobinToStanfordCorelabelConvertor(token);
tokenAnnotateCoreMap.add(coreLabel);
tokenAnnotate.add(coreLabel);
}
stanfordCorMap.set(edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation.class,tokenAnnotateCoreMap);
stanfordCorMap.set(edu.stanford.nlp.ling.CoreAnnotations.TokenBeginAnnotation.class,countFirstToken);
stanfordCorMap.set(edu.stanford.nlp.ling.CoreAnnotations.TokenEndAnnotation.class,countToken);
stanfordCorMap.set(CoreAnnotations.SentenceIndexAnnotation.class,countSentence);
stanfordsnetence.add(stanfordCorMap);
countSentence++;
// this is the parse tree of the current sentence
//Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
// this is the Stanford dependency graph of the current sentence
//SemanticGraph dependencies = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
}
document.set(edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation.class,tokenAnnotate);
document.set(edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation.class,stanfordsnetence);
Annotator annotator=new ParserAnnotator(false,0);
annotator.annotate(document);
annotator=new DeterministicCorefAnnotator(props);
annotator.annotate(document);
这是我的: 错误:
attempted to fetch annotator "parse" before the annotator pool was created!
java.lang.AssertionError
at edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder.getParser(RuleBasedCorefMentionFinder.java:345)
at edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder.parse(RuleBasedCorefMentionFinder.java:338)
at edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder.findSyntacticHead(RuleBasedCorefMentionFinder.java:273)
at edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder.findHead(RuleBasedCorefMentionFinder.java:215)
at edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder.extractPredictedMentions(RuleBasedCorefMentionFinder.java:88)
at edu.stanford.nlp.pipeline.DeterministicCorefAnnotator.annotate(DeterministicCorefAnnotator.java:89)
答案 0 :(得分:0)
据我所知,Standford的NLP库使用Multi-Pass Sieve算法来解析共指。您可以参考此answer查看如何使用该库以及此javadoc来获取完整的文档。
这是测试结果的代码:
public class CoReferenceAnalyzer
{
public static void main(String[] args)
{
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "My horse, whom I call Steve, is my best friend. He comforts me when I ride him";
Annotation document = new Annotation(text);
pipeline.annotate(document);
Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);
System.out.println("Graph: " + graph.toString());
for(Map.Entry<Integer, CorefChain> entry : graph.entrySet())
{
CorefChain chain = entry.getValue();
CorefMention repMention = chain.getRepresentativeMention();
System.out.println("Chain: " + chain.toString());
System.out.println("Rep: " + repMention.toString());
}
}
}
你会看到这样的输出:
Graph: {1=CHAIN1-["Steve" in sentence 1, "He" in sentence 2, "him" in sentence 2], 2=CHAIN2-["My horse , whom I call Steve" in sentence 1], 3=CHAIN3-["My horse" in sentence 1], 4=CHAIN4-["My" in sentence 1, "I" in sentence 1, "my" in sentence 1, "me" in sentence 2, "I" in sentence 2], 6=CHAIN6-["my best friend" in sentence 1], 8=CHAIN8-["He comforts me when I ride him" in sentence 2]}
Chain: CHAIN1-["Steve" in sentence 1, "He" in sentence 2, "him" in sentence 2]
Rep: "Steve" in sentence 1
答案 1 :(得分:0)
我想我已经和你达成了同样的问题。 问题可能是maven发布的jar与同一版本3.6.0上的 edu.stanford.nlp.simple.Document.java 中的源代码不同。
在源代码中,Document的构造函数如下所示:
public Document(String text) {
StanfordCoreNLP.getDefaultAnnotatorPool(EMPTY_PROPS, new AnnotatorImplementations()); // cache the annotator pool
this.impl = CoreNLPProtos.Document.newBuilder().setText(text);
}
但是在maven jar代码中,它看起来像这样:
public Document(String text) {
this.impl = CoreNLPProtos.Document.newBuilder().setText(text);
}
差异非常明显。
所以解决上述问题的方法是从中下载源代码 https://github.com/stanfordnlp/CoreNLP并使用ANT创建一个名为 stanford-corenlp.jar 的新jar。 然后用新罐子替换旧罐子。
希望该方法适合您。