如何获得CoreNLP Sentiment的分数分配值?

时间:2016-08-10 19:06:16

标签: stanford-nlp stanford-nlp-server

我在我的ubuntu实例上设置了CoreNLP服务器,它运行正常。我对Sentiment模块更感兴趣,目前我得到的是

{
sentimentValue: "2",
sentiment: "Neutral"
}

我需要的是分数分配值,如您所见:http://nlp.stanford.edu:8080/sentiment/rntnDemo.html

 "scoreDistr": [0.1685, 0.7187, 0.0903, 0.0157, 0.0068]

我缺少什么或如何获得此类数据?

由于

1 个答案:

答案 0 :(得分:2)

您需要从带注释的句子中通过SentimentCoreAnnotations.SentimentAnnotatedTree.class获取树对象。然后,您可以通过RNNCoreAnnotations类获得预测。我在下面编写了以下自包含演示代码,演示了如何获取CoreNLP情感预测的每个标签的分数。

import java.util.Arrays;
import java.util.List;
import java.util.Properties;

import org.ejml.simple.SimpleMatrix;

import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.util.CoreMap;

public class DemoSentiment {
    public static void main(String[] args) {
        final List<String> texts = Arrays.asList("I am happy.", "This is a neutral sentence.", "I am very angry.");
        final Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");
        final StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        for (String text : texts) {
            final Annotation doc = new Annotation(text);
            pipeline.annotate(doc);
            for (CoreMap sentence : doc.get(CoreAnnotations.SentencesAnnotation.class)) {
                final Tree tree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);
                final SimpleMatrix sm = RNNCoreAnnotations.getPredictions(tree);
                final String sentiment = sentence.get(SentimentCoreAnnotations.SentimentClass.class);
                System.out.println("sentence:  "+sentence);
                System.out.println("sentiment: "+sentiment);
                System.out.println("matrix:    "+sm);
            }
        }
    }
}

输出将类似(某些浮点舍入错误或更新的模型可能会改变分数)到下面的内容。

对于第一句I am happy.,您可以看到情绪为Positive,并且在解释时,第四个位置的返回矩阵中的最高值为0.618矩阵作为有序列表。

第二句This is a neutral sentence.在中间得分最高,为0.952,因此为Neutral情绪。

最后一句相应地具有Negative情绪,其中第二位的最高分为0.652

sentence:  I am happy.
sentiment: Positive
matrix:    Type = dense , numRows = 5 , numCols = 1
0.016  
0.037  
0.132  
0.618  
0.196  

sentence:  This is a neutral sentence.
sentiment: Neutral
matrix:    Type = dense , numRows = 5 , numCols = 1
0.001  
0.007  
0.952  
0.039  
0.001  

sentence:  I am very angry.
sentiment: Negative
matrix:    Type = dense , numRows = 5 , numCols = 1
0.166  
0.652  
0.142  
0.028  
0.012