Question

我想用Stanford CoreNLP解析一些德语文本并获得一个CONLL输出，这样我就可以将后者传递给CorZu以进行共指解析。

如何以编程方式进行？

这是我到目前为止的代码（只输出依赖树）：

Annotation germanAnnotation = new Annotation("Gestern habe ich eine blonde Frau getroffen"); Properties germanProperties = StringUtils.argsToProperties("-props", "StanfordCoreNLP-german.properties"); StanfordCoreNLP pipeline = new StanfordCoreNLP(germanProperties); pipeline.annotate(germanAnnotation); StringBuilder trees = new StringBuilder(""); for (CoreMap sentence : germanAnnotation.get(CoreAnnotations.SentencesAnnotation.class)) { Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class); trees.append(sentenceTree).append("\n"); }

Answer 1

使用以下代码，我设法以CONLL格式保存解析输出。

OutputStream outputStream = new FileOutputStream(new File("./target/", OUTPUT_FILE_NAME));
CoNLLOutputter.conllPrint(germanAnnotation, outputStream, pipeline);

但是，所有单词的HEAD字段均为0。我不确定解析或仅在CONLLOutputter中是否存在问题。老实说，我对CoreNLP太过恼火，无法进一步调查。

我决定，我建议改用ParZu。 ParZu和CorZu可以无缝协作 - 而且确实如此。在我的情况下，我有一个已经标记化和POS标记的文本。这使事情变得更容易，因为您不需要：

使用STTS标记集的POS-Tagger
形态分析工具

安装ParZu和CorZu后，您只需运行corzu.sh（包含在CorZu下载文件夹中）。如果您的文本被标记化并标记了POS，则可以相应地编辑脚本：

parzu_cmd="/YourPath/ParZu/parzu -i tagged"

最后注意：请务必将标记文字转换为以下格式，空行表示句子边界：word [tab] tag [newline]

Stanford CoreNLP：来自Java的CONLL格式输出

1 个答案: