如何在Java中使用StanfordNLP中文分段器?

时间:2016-07-15 09:30:50

标签: stanford-nlp text-segmentation

我尝试了以下代码,但代码不起作用,只输出null

String text = "我爱北京天安门。";
StanfordCoreNLP pipeline = new StanfordCoreNLP();
Annotation annotation = pipeline.process(text);
String result = annotation.get(CoreAnnotations.ChineseSegAnnotation.class);
System.out.println(result);

结果:

...
done [0.6 sec].
Using mention detector type: rule
null

如何正确使用StanfordNLP中文分段器?

1 个答案:

答案 0 :(得分:0)

一些示例代码:

import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.util.StringUtils;

import java.util.*;

public class ChineseSegmenter {

    public static void main (String[] args) {
        // set the properties to the standard Chinese pipeline properties
        Properties props = StringUtils.argsToProperties("-props", "StanfordCoreNLP-chinese.properties");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        String text = "...";
        Annotation annotation = new Annotation(text);
        pipeline.annotate(annotation);
        List<CoreLabel> tokens = annotation.get(CoreAnnotations.TokensAnnotation.class);
        for (CoreLabel token : tokens)
            System.out.println(token);
    }
}

注意:确保中国模型jar在CLASSPATH上。该文件位于:http://stanfordnlp.github.io/CoreNLP/download.html

上面的代码应该打印出中文分段运行后创建的标记。