Question

如果我使用TokenizerAnnotator，WordsToSentencesAnnotator，POSTaggerAnnotator和sutime创建AnnotationPipeline，我会将TimexAnnotations附加到生成的注释中。

但是如果我创建一个StanfordCoreNLP管道并将“annotators”属性设置为“tokenize，ssplit，pos，lemma，ner”，即使相关的单个令牌被NER标记为DATE，我也不会获得TimexAnnotations。 / p>

为什么会出现这种差异？

Answer 1

当我运行此命令时：

java -Xmx8g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -file data-example.txt -outputFormat text

我获得了DATE的TIMEX注释。 ner注释器应该默认应用SUTime。

Answer 2

当我们运行注释时，我们从文档中提取所有实体提及，并且我们认为DATE是实体提及。这是一些示例代码。如果您只想提取时间表达式并且希望填充TimexAnnotations.class字段，我会添加一些已注释的选项。

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.util.*;
import edu.stanford.nlp.time.TimeAnnotations;

import edu.stanford.nlp.pipeline.*;

import java.util.*;

public class SUTimeExample {

  public static void main(String[] args) {
    Annotation document =
        new Annotation("The date is 1 April 2017");
    Properties props = new Properties();
    //props.setProperty("customAnnotatorClass.time", "edu.stanford.nlp.time.TimeAnnotator");
    //props.setProperty("annotators", "tokenize,ssplit,pos,lemma,time");
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    pipeline.annotate(document);
    for (CoreMap entityMention : document.get(CoreAnnotations.MentionsAnnotation.class)) {
      if (entityMention.get(CoreAnnotations.EntityTypeAnnotation.class).equals("DATE"))
        System.out.println(entityMention);
    }
  }
}

使用StanfordCoreNLP管道时的日期

2 个答案: