Question

参议员布朗贝克提交了港口和移民法案，堪萨斯州共和党人

从上面的句子中，我希望获得以下类型的依赖项：

nsubjpass(submitted, Bills)
auxpass(submitted, were)
agent(submitted, Brownback)
nn(Brownback, Senator)
appos(Brownback, Republican)
prep_of(Republican, Kansas)
prep_on(Bills, ports)
conj_and(ports, immigration)
prep_on(Bills, immigration)

根据{{3}}的文档中的表1，图1，

应该可以。

使用下面的代码我只能实现以下依赖构成（代码输出）：

root(ROOT-0, submitted-7)
nmod:on(Bills-1, ports-3)
nmod:on(Bills-1, immigration-5)
case(ports-3, on-2)
cc(ports-3, and-4)
conj:and(ports-3, immigration-5)
nsubjpass(submitted-7, Bills-1)
auxpass(submitted-7, were-6)
nmod:agent(submitted-7, Brownback-10)
case(Brownback-10, by-8)
compound(Brownback-10, Senator-9)
punct(Brownback-10, ,-11)
appos(Brownback-10, Republican-12)
nmod:of(Republican-12, Kansas-14)
case(Kansas-14, of-13)

问题 - 如何实现上述所需的输出？

代码

public void processTestCoreNLP() {
    String text = "Bills on ports and immigration were submitted " +
            "by Senator Brownback, Republican of Kansas";

    Annotation annotation = new Annotation(text);
    Properties properties = PropertiesUtils.asProperties(
            "annotators", "tokenize,ssplit,pos,lemma,depparse"
    );

    AnnotationPipeline pipeline = new StanfordCoreNLP(properties);

    pipeline.annotate(annotation);

    for (CoreMap sentence : annotation.get(SentencesAnnotation.class)) {
        SemanticGraph sg = sentence.get(EnhancedPlusPlusDependenciesAnnotation.class);
        Collection<TypedDependency> dependencies = sg.typedDependencies();
        for (TypedDependency td : dependencies) {
            System.out.println(td);
        }
    }
}

Answer 1

CoreNLP最近从旧的Stanford dependencies格式（顶部示例中的格式）切换到Universal Dependencies。我的第一个建议是尽可能使用新格式。解析器的持续开发将使用通用依赖关系，并且格式在许多方面类似于旧格式，模数修改（例如，prep - ＆gt; nmod）。

但是，如果您想要获取旧的依赖关系格式，可以使用CollapsedCCProcessedDependenciesAnnotation注释执行此操作。

Answer 2

如果您想通过NN依赖解析器获取CCprocessed并折叠Stanford Dependencies（SD）的句子，您必须设置一个属性来规避CoreNLP中的小错误。

但，请注意我们不再维护斯坦福依赖关系代码，除非您有充分的理由使用SD，否则我们建议您使用Universal Dependencies任何新项目。有关UD表示的更多信息，请查看Universal Dependencies (UD) documentation和Schuster and Manning (2016)。

要获取CCprocessed和折叠的SD表示，请按如下所示设置depparse.language属性：

public void processTestCoreNLP() {
  String text = "Bills on ports and immigration were submitted " +
        "by Senator Brownback, Republican of Kansas";

  Annotation annotation = new Annotation(text);
  Properties properties = PropertiesUtils.asProperties(
        "annotators", "tokenize,ssplit,pos,lemma,depparse");

  properties.setProperty("depparse.language", "English")

  AnnotationPipeline pipeline = new StanfordCoreNLP(properties);

  pipeline.annotate(annotation);

  for (CoreMap sentence : annotation.get(SentencesAnnotation.class)) {
    SemanticGraph sg = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
    Collection<TypedDependency> dependencies = sg.typedDependencies();
    for (TypedDependency td : dependencies) {
      System.out.println(td);
    }
  }
}

CoreNLP斯坦福依赖格式

2 个答案: