在评估斯坦福NLP情绪时获得不同的结果

时间:2015-12-09 16:02:35

标签: stanford-nlp sentiment-analysis

我下载了Stanford NLP 3.5.2并使用默认配置运行情绪分析(即我没有更改任何内容,只需解压缩并运行)。

java -cp "*" edu.stanford.nlp.sentiment.Evaluate -model edu/stanford/nlp/models/sentiment/sentiment.ser.gz -treebank test.txt

EVALUATION SUMMARY
Tested 82600 labels
  66258 correct
  16342 incorrect
  0.802155 accuracy
Tested 2210 roots
  976 correct
  1234 incorrect
  0.441629 accuracy
Label confusion matrix
      Guess/Gold       0       1       2       3       4    Marg. (Guess)
               0     323     161      27       3       3     517
               1    1294    5498    2245     652     148    9837
               2     292    2993   51972    2868     282   58407
               3      99     602    2283    7247    2140   12371
               4       0       1      21     228    1218    1468
    Marg. (Gold)    2008    9255   56548   10998    3791

               0        prec=0.62476, recall=0.16086, spec=0.99759, f1=0.25584
               1        prec=0.55891, recall=0.59406, spec=0.94084, f1=0.57595
               2        prec=0.88982, recall=0.91908, spec=0.75299, f1=0.90421
               3        prec=0.58581, recall=0.65894, spec=0.92844, f1=0.62022
               4        prec=0.8297, recall=0.32129, spec=0.99683, f1=0.46321

Root label confusion matrix
      Guess/Gold       0       1       2       3       4    Marg. (Guess)
               0      44      39       9       0       0      92
               1     193     451     190     131      36    1001
               2      23      62      82      30       8     205
               3      19      81     101     299     255     755
               4       0       0       7      50     100     157
    Marg. (Gold)     279     633     389     510     399

               0        prec=0.47826, recall=0.15771, spec=0.97514, f1=0.2372
               1        prec=0.45055, recall=0.71248, spec=0.65124, f1=0.55202
               2        prec=0.4, recall=0.2108, spec=0.93245, f1=0.27609
               3        prec=0.39603, recall=0.58627, spec=0.73176, f1=0.47273
               4        prec=0.63694, recall=0.25063, spec=0.96853, f1=0.35971

Approximate Negative label accuracy: 0.646009
Approximate Positive label accuracy: 0.732504
Combined approximate label accuracy: 0.695110
Approximate Negative root label accuracy: 0.797149
Approximate Positive root label accuracy: 0.774477
Combined approximate root label accuracy: 0.785832

test.txt文件从http://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip下载(包含train.txt,dev.txt和test.txt)。下载链接来自http://nlp.stanford.edu/sentiment/code.html

然而,在文章“Socher,R.,Perelygin,A.,Wu,JY,Chuang,J.,Manning,CD,Ng,AY and Potts,C.,2013,October。递归深度模型的语义关于情感树库的组合性。在自然语言处理经验方法会议论文集(EMNLP)(第1631卷,第1642页)中。根据情感分析工具的基础,作者报告说,对5个类进行分类时的准确度 0.807

我的结果是否正常?

1 个答案:

答案 0 :(得分:0)

当我开箱即用时,我得到了相同的结果。如果他们为斯坦福CoreNLP制作的系统版本与论文中的版本略有不同,我不会感到惊讶。