Question

按照此页https://stanfordnlp.github.io/CoreNLP/coref.html#running-on-conll-2012上的说明，当我尝试在Conll-2012上重现中文参照结果时，这是我的代码：

public class TestCoref {

public static void main(String[] args) throws Exception {

    Properties props = StringUtils.argsToProperties(args);

    props.setProperty("props", "edu/stanford/nlp/coref/properties/neural-chinese-conll.properties");

    props.setProperty("coref.data", "path-to/data/conll-2012");

    props.setProperty("coref.conllOutputPath", "path-to-output/conll-results");

    props.setProperty("coref.scorer", "path-to/reference-coreference-scorers/v8.01/scorer.pl");


    CorefSystem coref = new CorefSystem(props);


    coref.runOnConll(props);

}

}

作为输出，我得到了3个这样的文件：

“日期time.coref.predicted.txt

日期time.coref.gold.txt

日期time.predicted.txt“

但所有这些都是空的！

我的“conll-2012”数据如下：

首先，我从此页面http://conll.cemantix.org/2012/data.html下载了train / dev / test-key数据，并从LDC下载了ontonote-release-5.0。然后我运行了脚本skeleton2conll.sh，该脚本提供了生成_conll文件的官方conll 2012数据。

我使用的模型在这里下载http://nlp.stanford.edu/software/stanford-chinese-corenlp-models-current.jar

当我试图找到问题时，我注意到CorefSystem类中存在一个函数“annotate”，它似乎完成了真正的工作，但根本没有使用它。 https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/coref/CorefSystem.java

我想知道runOnConll函数中是否有一个错误，它没有读取任何注释，或者我怎样才能重现共同结果？

PS：

我特别希望在conll-2012中对“tc”和“bc”这样的会话数据产生一些结果。我发现使用coreference API，我只能解析文本数据。除了在Conll-2012上运行之外，还有其他方法可以在会话数据上使用神经核心系统（应该指明不同的扬声器）吗？

提前感谢您的帮助！

Answer 1

首先，为什么不从命令行运行此命令：

java -Xmx10g -cp stanford-corenlp-3.9.1.jar:stanford-chine-corenlp-models-3.9.1.jar:* edu.stanford.nlp.coref.CorefSystem -props edu/stanford/nlp/coref/properties/neural-chinese-conll.properties -coref.data <path-to-conll-data> -coref.conllOutputPath <where-to-save-system-output> -coref.scorer <path-to-scoring-script>

使用CoreNLP神经系统在Conll-2012上再现中国共识时的空输出

1 个答案: