在stanford-nlp训练NER模型

时间:2016-06-24 12:08:02

标签: java stanford-nlp tokenize named-entity-recognition

我一直在尝试使用stanford Core NLP。我希望训练我自己的NER模型。从SO论坛和官方网站上描述使用属性文件来做到这一点。我如何通过API进行操作?。

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment, regexner");
props.setProperty("regexner.mapping", "resources/customRegexNER.txt");

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);      

String processedQuestion = "Who is the prime minister of Australia?"

//Annotation annotation = pipeline.process(processedQuestion);
Annotation document = new Annotation(processedQuestion);
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {

    // To get the tokens for the parsed sentence
    for (CoreMap tokens : sentence.get(TokensAnnotation.class)) {           
        String token = tokens.get(TextAnnotation.class);
        String POS = tokens.get(PartOfSpeechAnnotation.class);      
        String NER = tokens.get(NamedEntityTagAnnotation.class);            
        String Sentiment = tokens.get(SentimentClass.class);            
        String lemma = tokens.get(LemmaAnnotation.class);
  1. 如何&amp;我在哪里添加Prop文件?
  2. N-gram标记化(例如总理被视为单一标记,后来这个标记被传递给POS,NER而不是两个标记被传递(主要和部长))?

1 个答案:

答案 0 :(得分:1)

我认为它可以使用该代码:

val props = new Properties()
  props.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner")
  props.put("ner.model", "/your/path/ner-model.ser.gz");
  val pipeline = new StanfordCoreNLP(props)