Stanford NLP OpenIE未能识别某些句子的三元组

时间:2016-07-02 03:56:45

标签: java nlp stanford-nlp

我尝试过使用核心库和简单的包装器,并且都找不到相同琐碎句子的三元组。

简单的包装代码:

    for (final Quadruple<String, String, String, Double> tripple : sentence.openie()) {
        System.out.println(tripple);
    }

核心库的代码是their own example usage

package edu.stanford.nlp.naturalli;

import edu.stanford.nlp.ie.util.RelationTriple;
import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations;
import edu.stanford.nlp.util.CoreMap;
import edu.stanford.nlp.util.PropertiesUtils;

import java.util.Collection;
import java.util.List;
import java.util.Properties;

/**
 * A demo illustrating how to call the OpenIE system programmatically.
 */
public class OpenIEDemo {

  private OpenIEDemo() {} // static main

  public static void main(String[] args) throws Exception {
    // Create the Stanford CoreNLP pipeline
    Properties props = PropertiesUtils.asProperties(
            "annotators", "tokenize,ssplit,pos,lemma,depparse,natlog,openie"
            // , "depparse.model", "edu/stanford/nlp/models/parser/nndep/english_SD.gz"
            // "annotators", "tokenize,ssplit,pos,lemma,parse,natlog,openie"
            // , "parse.originalDependencies", "true"
    );
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // Annotate an example document.
    String text;
    if (args.length > 0) {
      text = IOUtils.slurpFile(args[0]);
    } else {
      text = "Obama was born in Hawaii. He is our president.";
    }
    Annotation doc = new Annotation(text);
    pipeline.annotate(doc);

    // Loop over sentences in the document
    int sentNo = 0;
    for (CoreMap sentence : doc.get(CoreAnnotations.SentencesAnnotation.class)) {
      System.out.println("Sentence #" + ++sentNo + ": " + sentence.get(CoreAnnotations.TextAnnotation.class));

      // Print SemanticGraph
      System.out.println(sentence.get(SemanticGraphCoreAnnotations.EnhancedDependenciesAnnotation.class).toString(SemanticGraph.OutputFormat.LIST));

      // Get the OpenIE triples for the sentence
      Collection<RelationTriple> triples = sentence.get(NaturalLogicAnnotations.RelationTriplesAnnotation.class);

      // Print the triples
      for (RelationTriple triple : triples) {
        System.out.println(triple.confidence + "\t" +
                triple.subjectLemmaGloss() + "\t" +
                triple.relationLemmaGloss() + "\t" +
                triple.objectLemmaGloss());
      }

      // Alternately, to only run e.g., the clause splitter:
      List<SentenceFragment> clauses = new OpenIE(props).clausesInSentence(sentence);
      for (SentenceFragment clause : clauses) {
        System.out.println(clause.parseTree.toString(SemanticGraph.OutputFormat.LIST));
      }
      System.out.println();
    }
  }

}

两种方法都发现三元组通过并且未通过相同的测试:

The cat jumped over the fence.: (cat,jumped over,fence,1.0)
The brown dog barked.: FAIL
The apple was eaten by John.: (apple,was eaten by,John,1.0)
Joe ate the ripe apple.: (Joe,ate,ripe apple,1.0)
They named their daughter Natasha.: (They,named,their daughter Natasha,1.0)
Bob sold me her boat.: FAIL
Grandfather left Rosalita and Raoul all his money.: FAIL
Who killed the cat?: FAIL
How many astronauts have walked on the moon?: (astronauts,have walked on,moon,1.0)

当它失败时,它返回的集合是空的。

是否有人遇到类似的问题和解决方法或任何其他解决方案?

1 个答案:

答案 0 :(得分:1)

OpenIE只返回三元组,因此如果您还想提取没有对象或其他语言补语的句子,则必须添加自己的规则。问题也是如此; OpenIE旨在用于从维基百科等文本中提取大规模关系,在这种情况下,考虑问题是没有意义的。

关于 Bob 示例,似乎这只适用于GitHub上的版本,所以你要么必须克隆它并自己编译或等待直到下一个版本才能使这句话发挥作用。