Question

在this问题之后，我正在尝试使用stanford corenlp进行词形推理。我的环境是： -

Java 1.7
Eclipse 3.4.0
StandfordCoreNLP版本3.4.1（downloaded from here）。

我的代码段是： -

//...........lemmatization starts........................

    Properties props = new Properties(); 
    props.put("annotators", "tokenize, ssplit, pos, lemma"); 
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
    String text = "painting"; 
    Annotation document = pipeline.process(text);  

    List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);

    for(edu.stanford.nlp.util.CoreMap sentence: sentences) 

    {    
        for(CoreLabel token: sentence.get(TokensAnnotation.class))
        {       
            String word = token.get(TextAnnotation.class);      
            String lemma = token.get(LemmaAnnotation.class); 
            System.out.println("lemmatized version :" + lemma);
        }
    }

    //...........lemmatization ends.........................

我得到的输出是： -

lemmatized version :painting

我期待

lemmatized version :paint

请赐教。

Answer 1

这个例子中的问题是，绘画这个词可以是绘制或名词的现在分词，而词形词的输出取决于分配给原作的词性标签字。

如果仅在片段绘画上运行标记器，则没有任何上下文可以帮助标记器（或人）决定如何标记该单词。在这种情况下，它选择了标签NN，名词绘画的引理实际上是绘画。

如果你用句子＆＃34运行相同的代码;我正在画一朵花。＆＃34;标记器应正确标记绘图为VBG，并且变形器应返回 paint 。

Stanford CorpNLP返回了错误的结果

1 个答案: