我正在尝试获取包含特定依赖关系的短语。例如,我想要包含主语的名词短语,作为同位语的名词短语等。例如:
Sentence: John Smith and Robert Alan Jones ate the warm pizza and cold salad by the car for an hour.
Phrasal Subject: John Smith and Robert Alan Jones
Negation:
Verbs: ate
Phrasal Direct Object: the warm pizza and cold salad
Phrasal Indirect Object:
Root Noun:
Phrasal Root Noun:
Phrasal Appositive:
Phrasal Subject Complement:
Phrasal Object Complement:
Phrasal Clausal Complement:
Adjective Phrase: warm
Adverbial Phrase:
Prepositional Phrases: [by the car, for an hour]
再次 - 我正在使用Dependency Parser;我已经编写了一些代码来递归导航TypedDependency集合...但它感觉很hacky。我是否应该使用内置方法从依赖关系中返回短语和单词组合(MWE,POSS等)? 杰夫
答案 0 :(得分:0)
我认为OpenIE系统有利于获得这样的三元组。
这是我写的一个基本例子,可能有更好的方法。 containsNounPhrase方法可用于添加到Tree。我也可能会在Stanford CoreNLP 3.8.0版本中添加一些内容。
package edu.stanford.nlp.examples;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;
import java.util.*;
public class PhraseDependencyExample {
public static Tree containingNounPhrase(Tree tree, Tree leaf) {
Tree currTree = leaf;
Tree largestNPTree = null;
while (currTree != null) {
if (currTree.label().value().equals("NP"))
largestNPTree = currTree;
currTree = currTree.parent(tree);
}
return largestNPTree;
}
public static void main(String[] args) {
// set up pipeline properties
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse");
// use faster shift reduce parser
props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz");
props.setProperty("parse.maxlen", "100");
// set up Stanford CoreNLP pipeline
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// build annotation for a review
Annotation annotation = new Annotation("John Smith and Robert Alan Jones ate the warm pizza and cold salad.");
// annotate the review
pipeline.annotate(annotation);
for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
System.err.println("---");
Tree sentenceConstituencyParse = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
System.err.println(sentenceConstituencyParse);
SemanticGraph sg = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
for (IndexedWord iw : sg.vertexListSorted()) {
if (iw.tag().equals("VBD")) {
System.err.println("---");
System.err.println("verb: "+iw.word());
for (SemanticGraphEdge sge : sg.outgoingEdgeList(iw)) {
if (sge.getRelation().getShortName().equals("dobj") || sge.getRelation().getShortName().equals("nsubj")) {
int tokenIndex = sge.getDependent().backingLabel().index()-1;
String fullPhrase = containingNounPhrase(sentenceConstituencyParse,
sentenceConstituencyParse.getLeaves().get(tokenIndex)).yieldWords().toString();
System.err.println("\t"+sge.getRelation() + " --> "+fullPhrase);
}
}
}
}
}
}
}
此代码显示如何获取选区树的叶子和依赖关系解析图的顶点。
它被设置为获取包含该单词的最大名词短语,但您可以改变它以获得最小的等等...