目标语言是西班牙语。
英语管道支持类型化依赖项,而据我所知,西班牙语管道不支持。
目标是从TreeAnnotation生成依赖关系树,其中最终结果是有向边的列表。这是否可以使用CoreNLP 3.4.1并使用西班牙语模型,如果是这样的话:怎么做?
我正在使用Stanford CoreNLP 3.4.1 +(用于POS标记的3.5.0西班牙语模型)(由于兼容性原因,Java 8尚未使用),具有以下配置:
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, ner, parse");
props.setProperty("tokenize.options", "invertible=true,ptb3Escaping=true");
props.setProperty("tokenize.language", "es");
props.setProperty("pos.model", "edu/stanford/nlp/models/pos-tagger/spanish/spanish-distsim.tagger");
props.setProperty("ner.model", "edu/stanford/nlp/models/ner/spanish.ancora.distsim.s512.crf.ser.gz");
props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/spanishSR.ser.gz"); //Stanford Parser 3.4.1 shift-reduce models for Spanish.
props.setProperty("ner.applyNumericClassifiers", "false");
props.setProperty("ner.useSUTime", "false");
然后使用它来创建管道并运行文档的注释。
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
for(CoreMap sentence: sentences) {
// ... extract start, end position of sentence ...
for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) {
// ... extract POS tags, NER annotations, id ...
}
//This works, and I have a tree that is not empty.
Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
}
通过使用调试器,我能够检查句子和标记,并得出结论:它们具有以下内容:
来自edu.stanford.nlp.ling.CoreAnnotations:
来自edu.stanford.nlp.trees.TreeCoreAnnotations
来自edu.stanford.nlp.ling.CoreAnnotations
来自edu.stanford.nlp.trees.TreeCoreAnnotations
提前致谢!
答案 0 :(得分:1)
There is no support for Spanish dependency parsing in CoreNLP at the moment. This includes typed dependency conversion from constituency parses.
There is a head finder implemented (but not fully tested). You could hack an untyped dependency converter using this head finder, but we have no guarantees that this will yield a sensible parse.