Question

我希望使用stanford解析器在给定的句子中找到多个名词短语。我正在使用Java。

例句：

画质非常好。

现在我需要提取“图片质量”。

有没有办法遍历依赖树以达到预期的结果？
另外，可以用XML格式的stanford解析器标签句子吗？

Answer 1

如果你想找到所有的名词短语，那么通过使用短语结构解析树而不是依赖关系表示，这可能是最容易完成的。您可以手动遍历Tree对象的节点并查看label（）。value（）是否为“NP”，或者您可以使用TregexPattern“@NP”，然后使用TregexMatcher遍历NP。 / p>

您可以使用

的命令行标志从解析器获取XML格式输出

-outputFormatOptions xml

或在代码中通过使用选项字符串“xml”构建TreePrint对象。

Answer 2

为了扩展@ christopher-manning的答案，这里有一些你可以使用的代码：

private List<String> getNounPhrases(Tree parse) {
    List<String> result = new ArrayList<>();
    TregexPattern pattern = TregexPattern.compile("@NP");
    TregexMatcher matcher = pattern.matcher(parse);
    while (matcher.find()) {
        Tree match = matcher.getMatch();
        List<Tree> leaves = match.getLeaves();
        System.out.println(leaves);
        // Some Guava magic.
        String nounPhrase = Joiner.on(' ').join(Lists.transform(leaves, Functions.toStringFunction()));
        result.add(nounPhrase);
        List<LabeledWord> labeledYield = match.labeledYield();
        System.out.println("labeledYield: " + labeledYield);
    }
    return result;
}

通过stanford解析器查找名词短语

2 个答案: