Question

使用最新的CoreNLP 3.9.2 Java API，我希望提取出现在StanfordNLP Python library中以及此处定义为universaldependencies.org/guidelines.html的新通用依赖项功能。具体来说：

多字令牌
通用依赖项格式（UPOS）的POS标签
UD格式的语法依赖性（使用UPOS标签）

当前的CoreNLP分别产生here和here所述的Penn Tree POS标签和相关性。

管道配置：

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");
    props.setProperty("coref.algorithm", "neural");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    CoreDocument document = new CoreDocument(text);
    pipeline.annotate(document);

    CoreSentence sentence = document.sentences().get(0);
    sentence.posTags() // get pos tags
    sentence.dependencyParse() // dependency graph

必须对我的误解提供任何帮助和澄清。

Answer 1

GitHub上的法语，德语和西班牙语代码和模型在CoNLL 2018 UD数据上进行了培训，并支持多词令牌。

我们可能会或不会训练英语UD词性模型。

我相信选区分析器数据使用的是英语特定的词性标签。

这些更改将放入4.0.0版本中，有望在今年年底之前完成。

Java CoreNLP中缺少StanfordNLP通用依赖项功能

1 个答案: