Question

我正在尝试获取包含特定依赖关系的短语。例如，我想要包含主语的名词短语，作为同位语的名词短语等。例如：

               Sentence: John Smith and Robert Alan Jones ate the warm pizza and cold salad by the car for an hour.
         Phrasal Subject: John Smith and Robert Alan Jones
                Negation: 
                   Verbs: ate
   Phrasal Direct Object: the warm pizza and cold salad
 Phrasal Indirect Object: 
               Root Noun: 
       Phrasal Root Noun: 
      Phrasal Appositive: 
Phrasal Subject Complement: 
Phrasal Object Complement: 
Phrasal Clausal Complement: 
        Adjective Phrase: warm
        Adverbial Phrase: 
   Prepositional Phrases: [by the car, for an hour]

再次 - 我正在使用Dependency Parser;我已经编写了一些代码来递归导航TypedDependency集合...但它感觉很hacky。我是否应该使用内置方法从依赖关系中返回短语和单词组合（MWE，POSS等）？杰夫

Answer 1

我认为OpenIE系统有利于获得这样的三元组。

这是我写的一个基本例子，可能有更好的方法。 containsNounPhrase方法可用于添加到Tree。我也可能会在Stanford CoreNLP 3.8.0版本中添加一些内容。

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;

import java.util.*;

public class PhraseDependencyExample {

  public static Tree containingNounPhrase(Tree tree, Tree leaf) {
    Tree currTree = leaf;
    Tree largestNPTree = null;
    while (currTree != null) {
      if (currTree.label().value().equals("NP"))
        largestNPTree = currTree;
      currTree = currTree.parent(tree);
    }
    return largestNPTree;
  }

  public static void main(String[] args) {
    // set up pipeline properties
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse");
    // use faster shift reduce parser
    props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz");
    props.setProperty("parse.maxlen", "100");
    // set up Stanford CoreNLP pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // build annotation for a review
    Annotation annotation = new Annotation("John Smith and Robert Alan Jones ate the warm pizza and cold salad.");
    // annotate the review
    pipeline.annotate(annotation);
    for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
      System.err.println("---");
      Tree sentenceConstituencyParse = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
      System.err.println(sentenceConstituencyParse);
      SemanticGraph sg = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
      for (IndexedWord iw : sg.vertexListSorted()) {
        if (iw.tag().equals("VBD")) {
          System.err.println("---");
          System.err.println("verb: "+iw.word());
          for (SemanticGraphEdge sge : sg.outgoingEdgeList(iw)) {
            if (sge.getRelation().getShortName().equals("dobj") || sge.getRelation().getShortName().equals("nsubj")) {
              int tokenIndex = sge.getDependent().backingLabel().index()-1;
              String fullPhrase = containingNounPhrase(sentenceConstituencyParse,
                  sentenceConstituencyParse.getLeaves().get(tokenIndex)).yieldWords().toString();
              System.err.println("\t"+sge.getRelation() + " --> "+fullPhrase);
            }
          }
        }
      }
    }
  }
}

此代码显示如何获取选区树的叶子和依赖关系解析图的顶点。
它被设置为获取包含该单词的最大名词短语，但您可以改变它以获得最小的等等...

从依赖关系到短语

1 个答案: