依赖于情感的节点依赖于斯坦福CoreNLP解析?

时间:2014-05-19 04:41:38

标签: nlp text-parsing stanford-nlp sentiment-analysis

我想对一组句子执行依赖性解析,并查看单个节点的情绪评级,如斯坦福情感树库(http://nlp.stanford.edu/sentiment/treebank.html)。

我是CoreNLP API的新手,在摆弄后我仍然不知道我是如何进行排名节点的依赖解析。这对CoreNLP来说是否可能,如果是这样,有没有人有这方面的经验呢?

1 个答案:

答案 0 :(得分:6)

我修改了包含StanfordCoreNLPDemo.java文件的代码,以满足我们的情绪需求:

进口:

import java.io.*;
import java.util.*;

import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations;
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations.PredictedClass;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations;
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;

初始化管道。属性包括引理和情绪:

public class StanfordCoreNlpDemo {

  public static void main(String[] args) throws IOException {
    PrintWriter out;
    if (args.length > 1) {
      out = new PrintWriter(args[1]);
    } else {
      out = new PrintWriter(System.out);
    }
    PrintWriter xmlOut = null;
    if (args.length > 2) {
      xmlOut = new PrintWriter(args[2]);
    }
    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, parse, sentiment");
    props.setProperty("tokenize.options","normalizeCurrency=false");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

添加文字。这3个句子取自您链接的网站的现场演示。我也打印顶级注释的键,以查看您可以从中访问的内容:

Annotation annotation;
    if (args.length > 0) {
      annotation = new Annotation(IOUtils.slurpFileNoExceptions(args[0]));
    } else {
      annotation = new Annotation("This movie doesn't care about cleverness, wit or any other kind of intelligent humor.Those who find ugly meanings in beautiful things are corrupt without being charming.There are slow and repetitive parts, but it has just enough spice to keep it interesting.");
    }

    pipeline.annotate(annotation);
    pipeline.prettyPrint(annotation, out);
    if (xmlOut != null) {
      pipeline.xmlPrint(annotation, xmlOut);
    }

    // An Annotation is a Map and you can get and use the various analyses individually.
    // For instance, this gets the parse tree of the first sentence in the text.
    out.println();
    // The toString() method on an Annotation just prints the text of the Annotation
    // But you can see what is in it with other methods like toShorterString()
    out.println("The top level annotation's keys: ");
    out.println(annotation.keySet());

对于第一句话,我打印出它的键和情绪。然后,我遍历其所有节点。对于每一个,我打印该子树的叶子,这将是该节点所指的句子的一部分,节点的名称,它的情感,它的节点向量(我不知道那是什么)及其预测。 情绪是一个整数,范围从0到4. 0非常负,1负,2中立,3正和4非常正。预测是4个值的向量,每个值包括该节点属于上述每个类的可能性的百分比。第一个值是非常负面的等等。最高的百分比是节点的情绪。 并非注释树的所有节点都有情绪。似乎句子中的每个单词在树中都有两个节点。您可能希望单词是叶子,但它们只有一个子节点,它是一个标签,其键中缺少预测注释。节点的名称是相同的单词。 这就是我在调用函数之前检查预测注释的原因。然而,正确的方法是忽略抛出的空指针异常,但我选择详细说明,让这个答案的读者理解没有关于情绪的信息丢失。

List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
if (sentences != null && sentences.size() > 0) {
  ArrayCoreMap sentence = (ArrayCoreMap) sentences.get(0);


  out.println("Sentence's keys: ");
  out.println(sentence.keySet());

  Tree tree2 = sentence.get(SentimentCoreAnnotations.AnnotatedTree.class);
  out.println("Sentiment class name:");
  out.println(sentence.get(SentimentCoreAnnotations.ClassName.class));

  Iterator<Tree> it = tree2.iterator();
  while(it.hasNext()){
      Tree t = it.next();
      out.println(t.yield());
      out.println("nodestring:");
      out.println(t.nodeString());
      if(((CoreLabel) t.label()).containsKey(PredictedClass.class)){
          out.println("Predicted Class: "+RNNCoreAnnotations.getPredictedClass(t));
      }
      out.println(RNNCoreAnnotations.getNodeVector(t));
      out.println(RNNCoreAnnotations.getPredictions(t));
  }

最后,还有一些输出。打印依赖关系。这里的依赖关系也可以由解析树的访问者(树或树2)访问:

      out.println("The first sentence is:");
      Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
      out.println();
      out.println("The first sentence tokens are:");
      for (CoreMap token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
        ArrayCoreMap aToken = (ArrayCoreMap) token;
        out.println(aToken.keySet());
        out.println(token.get(CoreAnnotations.LemmaAnnotation.class));
      }
      out.println("The first sentence parse tree is:");
      tree.pennPrint(out);
      tree2.pennPrint(out);
      out.println("The first sentence basic dependencies are:"); 
      out.println(sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class).toString(SemanticGraph.OutputFormat.LIST));
      out.println("The first sentence collapsed, CC-processed dependencies are:");
      SemanticGraph graph = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
      out.println(graph.toString(SemanticGraph.OutputFormat.LIST));
    }
  }

}