我正在为新语言准备CMU sphinx数据字典。 我在ur.txt中进行了几百个音译,并通过传递给文档中提到的g2p-seq2seq进行训练,结果是精度:0和错误:1。
wordlist文件是带有urdu字符的utf8。 http://pastebin.com/2rRXay9J 只是第一次测试它,任何人都可以识别它中的问题或者它是否正确?
# g2p-seq2seq --train ur.txt --model ur-model3 --size 512 --max_steps 50 &
Preparing G2P data
Creating vocabularies in ur-model3
Creating vocabulary ur-model3/vocab.phoneme
Creating vocabulary ur-model3/vocab.grapheme
Reading development and training data.
Creating 2 layers of 512 units.
Created model with fresh parameters.
Training done.
Creating 2 layers of 512 units.
Reading model parameters from ur-model3
Beginning calculation word error rate (WER) on test sample.
Words: 14
Errors: 14
WER: 1.000
Accuracy: 0.000