Question

目标语言是西班牙语。

英语管道支持类型化依赖项，而据我所知，西班牙语管道不支持。

目标是从TreeAnnotation生成依赖关系树，其中最终结果是有向边的列表。这是否可以使用CoreNLP 3.4.1并使用西班牙语模型，如果是这样的话：怎么做？

背景

我正在使用Stanford CoreNLP 3.4.1 +（用于POS标记的3.5.0西班牙语模型）（由于兼容性原因，Java 8尚未使用），具有以下配置：

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, ner, parse");
props.setProperty("tokenize.options", "invertible=true,ptb3Escaping=true");
props.setProperty("tokenize.language", "es");

props.setProperty("pos.model", "edu/stanford/nlp/models/pos-tagger/spanish/spanish-distsim.tagger");
props.setProperty("ner.model", "edu/stanford/nlp/models/ner/spanish.ancora.distsim.s512.crf.ser.gz");

props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/spanishSR.ser.gz"); //Stanford Parser 3.4.1 shift-reduce models for Spanish. 

props.setProperty("ner.applyNumericClassifiers", "false");
props.setProperty("ner.useSUTime", "false");

然后使用它来创建管道并运行文档的注释。

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);

List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

for(CoreMap sentence: sentences) {

    // ... extract start, end position of sentence ...

    for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) {

        // ... extract POS tags, NER annotations, id ...
    }

    //This works, and I have a tree that is not empty.
    Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
}

通过使用调试器，我能够检查句子和标记，并得出结论：它们具有以下内容：

句子（键）

来自edu.stanford.nlp.ling.CoreAnnotations：

TextAnnotation
CharacterOffsetBeginAnnotation
CharacterOffsetEndAnnotation
TokensAnnotation
TokenBeginAnnotation
TokenEndAnnotation
SentenceIndexAnnotation

来自edu.stanford.nlp.trees.TreeCoreAnnotations

TreeAnnotation

代币（键）

来自edu.stanford.nlp.ling.CoreAnnotations

TextAnnotation
OriginalTextAnnotation
CharacterOffsetBeginAnnotation
CharacterOffsetEndAnnotation
BeforeAnnotation
AfterAnnotation
IndexAnnotation
SentenceIndexAnnotation
PartOfSpeechAnnotation
NamedEntityTagAnnotation

来自edu.stanford.nlp.trees.TreeCoreAnnotations

HeadWordAnnotation - 在我的实验中：这个总是指向自己，即从中检索注释的标记。
HeadTagAnnotation

提前致谢！

Answer 1

There is no support for Spanish dependency parsing in CoreNLP at the moment. This includes typed dependency conversion from constituency parses.

There is a head finder implemented (but not fully tested). You could hack an untyped dependency converter using this head finder, but we have no guarantees that this will yield a sensible parse.

如何使用Stanford CoreNLP从TreeAnnotation中提取未标记/无类型的依赖关系树？

背景

句子（键）

代币（键）

1 个答案: