我正在使用Stanford coreNLP来解析一些文本。我得到多个句子。在这些句子中,我设法使用TregexPattern提取名词短语。所以我得到了一个儿童树,这是我的名词短语。我还设法找出了名词短语的头部。
如何在句子中获得该头部的位置甚至令牌/ coreLabel?
更好的是,如何才能找到Head与句子其余部分的依赖关系?
以下是一个例子:
public void doSomeTextKarate(String text){
Properties props = new Properties();
props.put("annotators","tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
this.pipeline = pipeline;
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
SemanticGraph basicDeps = sentence.get(BasicDependenciesAnnotation.class);
Collection<TypedDependency> typedDeps = basicDeps.typedDependencies();
System.out.println("typedDeps ==> "+typedDeps);
SemanticGraph collDeps = sentence.get(CollapsedDependenciesAnnotation.class);
SemanticGraph collCCDeps = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
List<CoreMap> numerizedTokens = sentence.get(NumerizedTokensAnnotation.class);
List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
sentenceTree.percolateHeads(headFinder);
Set<Dependency<Label, Label, Object> > sentenceDeps = sentenceTree.dependencies();
for (Dependency<Label, Label, Object> dependency : sentenceDeps) {
System.out.println("sentence dep = " + dependency);
System.out.println(dependency.getClass() +" ( " + dependency.governor() + ", " + dependency.dependent() +") " );
}
//find nounPhrases in setence
TregexPattern pat = TregexPattern.compile("@NP");
TregexMatcher matcher = pat.matcher(sentenceTree);
while (matcher.find()) {
Tree nounPhraseTree = matcher.getMatch();
System.out.println("Found noun phrase " + nounPhraseTree);
nounPhraseTree.percolateHeads(headFinder);
Set<Dependency<Label, Label, Object> > npDeps = nounPhraseTree.dependencies();
for (Dependency<Label, Label, Object> dependency : npDeps ) {
System.out.println("nounPhraseTree dep = " + dependency);
}
Tree head = nounPhraseTree.headTerminal(headFinder);
System.out.println("head " + head);
Set<Dependency<Label, Label, Object> > headDeps = head.dependencies();
for (Dependency<Label, Label, Object> dependency : headDeps) {
System.out.println("head dep " + dependency);
}
//QUESTION :
//How do I get the position of "head" in tokens or numerizedTokens ?
//How do I get the dependencies where "head" is involved in typedDeps ?
}
}
}
换句话说,我想查询所有依赖关系,其中&#34; head&#34;单词/标记/标签包含在整个句子中。所以我认为我需要在句子中找出该标记的位置以将其与类型化的依赖关系相关联,但mybe有一些更简单的方法吗?
提前致谢。
[编辑]
所以我可能已经找到答案或开头。
如果我在头上打电话给.label()我会给自己一个CoreLabel,这正是我需要找到的。我现在可以遍历类型化的依赖项并搜索依赖项,其中dominator标签或依赖标签与我的headLabel具有相同的索引。
Tree nounPhraseTree = matcher.getMatch();
System.out.println("Found noun phrase " + nounPhraseTree);
nounPhraseTree.percolateHeads(headFinder);
Tree head = nounPhraseTree.headTerminal(headFinder);
CoreLabel headLabel = (CoreLabel) head.label();
System.out.println("tokens.contains(headLabel)" + tokens.contains(headLabel));
System.out.println("");
System.out.println("Iterating over typed deps");
for (TypedDependency typedDependency : typedDeps) {
System.out.println(typedDependency.gov().backingLabel());
System.out.println("gov pos "+ typedDependency.gov() + " - " + typedDependency.gov().index());
System.out.println("dep pos "+ typedDependency.dep() + " - " + typedDependency.dep().index());
if(typedDependency.gov().index() == headLabel.index() ){
System.out.println("dep or gov backing label equals headlabel :" + (typedDependency.gov().backingLabel().equals(headLabel) ||
typedDependency.dep().backingLabel().equals(headLabel))); //why does this return false all the time ?
System.out.println(" !!!!!!!!!!!!!!!!!!!!! HIT ON " + headLabel + " == " + typedDependency.gov());
}
}
所以我似乎只能使用索引将我头部的标签与typedDeps中的标签匹配。我想知道这是否适合这样做。 正如你在我的代码中看到的那样,我也尝试使用TypedDependency.backingLabel()来测试与headLabel的相等性,或者使用调控器或依赖,但系统地返回false。我想知道为什么 !?
任何反馈意见。
答案 0 :(得分:2)
您可以使用CoreAnnotations.IndexAnnotation
注释在其包含的句子中获取CoreLabel的位置。
找到给定单词的所有依赖项的方法似乎是正确的,并且可能是最简单的方法。