应用错误收集

我正在尝试使用Kneser-Ney折扣使用SRILM构建语言模型。似乎有两种方法可以使用-kndiscount选项： 1）设置要应用折扣的ngram的确切顺序： ngram-count -tolower -kndiscount1 -kndiscount2 -kndiscount3 -debug 1 -order 4 -text test.txt -lm test.lm -vocab test_voc.txt 2）单独使用-kndiscount告诉ngram-count将其用于所有订单： ngram-count -tolower -kndiscount -debug 1 -order 4 -text test.txt -lm test.lm -vocab test_voc.txt

我用两种设置构建了两个模型，它们是不同的。 1克和2克的概率和退避权重相似，但3克的概率和退避权重不同。

为什么会这样？我预期会有类似的结果

使用-kndiscount具有确切值而没有它们的结果不同

0 个答案: