Question

我正在使用Stanford Neural Network Dependency Parser。我已经在法国的树库（GSD，ParTUT，Sequoia，Spoken）上训练了一些模型，现在我试图在树的测试段上生成模型的输出。在ParTUT，Sequoia和Spoken上一切正常，但是GSD给我带来了一些问题。我运行的命令是：

java -Xmx1g -cp "*" edu.stanford.nlp.parser.nndep.DependencyParser \ -model Stf_ud_gsd_2200.model.txt.gz.gz -testFile fr_gsd_ud_test_new.conllu -outFile FR/Stf_gsd_ud.conllu

我收到以下错误：

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" java.lang.NumberFormatException: For input string: "358,6"
        at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source)
        at sun.misc.FloatingDecimal.parseDouble(Unknown Source)
        at java.lang.Double.parseDouble(Unknown Source)
        at edu.stanford.nlp.parser.nndep.DependencyParser.loadModelFile(DependencyParser.java:570)
        at edu.stanford.nlp.parser.nndep.DependencyParser.loadModelFile(DependencyParser.java:508)
        at edu.stanford.nlp.parser.nndep.DependencyParser.main(DependencyParser.java:1284)

如果我正确理解这一点，那么问题就不在于测试树库，而在于模型本身以及一些数字被保存在那里的方式。

有人对如何克服它有任何提示吗？我将非常感谢您的帮助！

Answer 1

问题是法国GSD训练数据中存在包含令牌的空间。 3 358,6。因此，当模型加载代码读取该令牌的嵌入内容时，它将在空间上分割，并认为嵌入的第一个值为358,6。

如果您在该令牌中添加逗号（通过编辑该行），则模型将起作用。我认为您可以删除该行并将顶部指定的字典大小减小1（以反映词汇表中少一个单词）。

尽管应该在培训代码中解决此问题，但通常令牌中没有空格。

斯坦福神经网络相关性解析器中的加载模型的一些问题

1 个答案: