使用stanford nlp的SUTime包

时间:2015-11-26 17:53:23

标签: stanford-nlp

我试图通过CoreNLP使用Stanford NLP的SUTime代码:

AnnotationPipeline pipeline = new AnnotationPipeline();
pipeline.addAnnotator(new TimeAnnotator("sutime", props));

Annotation annotation = new Annotation("The interesting date is 4 days from today and it is 20th july of this year, another date is 18th Feb 1997");
annotation.set(CoreAnnotations.DocDateAnnotation.class, "2013-07-14");
pipeline.annotate(annotation);
List<CoreMap> timexAnnsAll = annotation.get(TimeAnnotations.TimexAnnotations.class);

然而,结果却抛出了这个例外:

Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
Exception in thread "main" java.lang.NullPointerException
    at edu.stanford.nlp.ie.NumberNormalizer.findNumbers(NumberNormalizer.java:423)
    at edu.stanford.nlp.ie.NumberNormalizer.findAndMergeNumbers(NumberNormalizer.java:721)
    at edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressions(TimeExpressionExtractorImpl.java:184)
    at edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressions(TimeExpressionExtractorImpl.java:178)
    at edu.stanford.nlp.time.TimeExpressionExtractorImpl.extractTimeExpressionCoreMaps(TimeExpressionExtractorImpl.java:116)
    at edu.stanford.nlp.time.TimeAnnotator.annotateSingleSentence(TimeAnnotator.java:240)
    at edu.stanford.nlp.time.TimeAnnotator.annotate(TimeAnnotator.java:226)
    at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:68)
    at dd.stanford.main(stanford.java:74)

我正在使用maven的3.5.2版本。有没有人知道为什么这个例外?提前谢谢。

1 个答案:

答案 0 :(得分:0)

由于您没有对文本进行标记化,因此崩溃了。

将您的代码更改为:

AnnotationPipeline pipeline = new AnnotationPipeline();
pipeline.addAnnotator(new TokenizerAnnotator(false));
pipeline.addAnnotator(new TimeAnnotator("sutime", props));