Sphinx 4损坏ARPA LM?

时间:2011-02-28 14:03:24

标签: speech-recognition speech-to-text n-gram sphinx4 language-model

我有kylm生成的ARPA LM,运行SPHINX时我得到了这个异常堆栈跟踪:

Exception in thread "main" java.lang.RuntimeException: Allocation of search manager resources failed
        at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:242)
        at edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:87)
        at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:168)
        at transcribing.Main.main(Main.java:78)
Caused by: java.io.IOException: Corrupt Language Model file:./corpus.arpa at line 2420:Premature EOF
        at edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel.corrupt(SimpleNGramModel.java:458)
        at edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel.readLine(SimpleNGramModel.java:404)
        at edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel.load(SimpleNGramModel.java:307)
        at edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel.allocate(SimpleNGramModel.java:110)
        at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:342)
        at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:238)
        ... 3 more
Java Result: 1

以下是ARPA LM的摘录:

[n]
3

[smoother]
kylm.model.ngram.smoother.KNSmoother

[closed]
true

[max_length]
1091

[vocab_cutoff]
0

[start_symbol]
<s>

[terminal_symbol]
</s>

[unknown_symbol]
<unk>

\data\
ngram 1=406
ngram 2=768
ngram 3=937
\1-grams: 
-99.0000    <s> -0.3630
...
...

\end\

PS \end\ 之后有一个新行

该例子表示SPHINX在最后一行遇到意外的EOF(不应该在那里遇到EOF吗?)

请任何帮助!

1 个答案:

答案 0 :(得分:1)

事实证明这是一个SPHINX 4错误。

如果\1-grams:指令(或任何其他指令实际上)包含尾部空格[s],SimpleNGramModel无法解析它! 我刚刚提交了补丁,但您可以找到它here