在翻译音频时,CMU Sphinx的准确度非常低

时间:2015-09-25 07:54:43

标签: java audio cmusphinx

我尝试使用以下代码使用Sphinx从音频中获取单词结果,但是准确度非常低,我可以知道如何改进它吗? 实际上只有前三个单词是正确的,左边3个单词无法正确检测。

我已使用此语音https://www.dropbox.com/s/33fzzf5s59wbi7e/test2.wav?dl=0

对其进行了测试
 Configuration configuration = new Configuration();

// Set path to acoustic model.
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// Set path to dictionary.
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
// Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");

StreamSpeechRecognizer recognizer;
try {
    recognizer = new StreamSpeechRecognizer(configuration);

    recognizer.startRecognition(new FileInputStream("1.wav"));
    SpeechResult result = recognizer.getResult();
    recognizer.stopRecognition();


    // Print utterance string without filler words.
    System.out.println(result.getHypothesis());

    System.out.println("================word result=============="+result.getWords().size());
    // Get individual words and their times.
    for (WordResult r : result.getWords()) {
        System.out.println(r);
    }
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

以下是结果的输出:

15:50:34.677 INFO speedTracker         # ----------------------------- Timers----------------------------------------
15:50:34.677 INFO speedTracker         # Name               Count   CurTime   MinTime   MaxTime   AvgTime   TotTime   
15:50:34.678 INFO speedTracker         Compile              1       0.9860s   0.9860s   0.9860s   0.9860s   0.9860s   
15:50:34.678 INFO speedTracker         Load LM              1       0.8220s   0.8220s   0.8220s   0.8220s   0.8220s   
15:50:34.678 INFO speedTracker         Load Dictionary      1       0.1020s   0.1020s   0.1020s   0.1020s   0.1020s   
15:50:34.678 INFO speedTracker         Load AM              1       1.5760s   1.5760s   1.5760s   1.5760s   1.5760s   
15:50:41.150 INFO speedTracker            This  Time Audio: 1.41s  Proc: 6.40s  Speed: 4.54 X real time
15:50:41.152 INFO speedTracker            Total Time Audio: 1.41s  Proc: 6.40s 4.54 X real time
15:50:41.152 INFO memoryTracker           Mem  Total: 1181.50 Mb  Free: 726.20 Mb
15:50:41.152 INFO memoryTracker           Used: This: 455.30 Mb  Avg: 455.30 Mb  Max: 455.30 Mb
15:50:41.152 INFO trieNgramModel       LM Cache Size: 5646 Hits: 1215433 Misses: 5646
15:50:41.234 INFO speedTracker         # ----------------------------- Timers----------------------------------------
15:50:41.234 INFO speedTracker         # Name               Count   CurTime   MinTime   MaxTime   AvgTime   TotTime   
15:50:41.234 INFO speedTracker         Compile              1       0.9860s   0.9860s   0.9860s   0.9860s   0.9860s   
15:50:41.234 INFO speedTracker         Load LM              1       0.8220s   0.8220s   0.8220s   0.8220s   0.8220s   
15:50:41.234 INFO speedTracker         Load Dictionary      1       0.1020s   0.1020s   0.1020s   0.1020s   0.1020s   
15:50:41.234 INFO speedTracker         Score                318     0.0000s   0.0000s   0.0630s   0.0026s   0.8240s   
15:50:41.235 INFO speedTracker         Prune                1111    0.0000s   0.0000s   0.0070s   0.0001s   0.1440s   
15:50:41.235 INFO speedTracker         Grow                 1113    0.0000s   0.0000s   1.0780s   0.0049s   5.4790s   
15:50:41.235 INFO speedTracker         Frontend             161     0.0000s   0.0000s   0.0600s   0.0004s   0.0610s   
15:50:41.235 INFO speedTracker         Load AM              1       1.5760s   1.5760s   1.5760s   1.5760s   1.5760s   
15:50:41.235 INFO speedTracker            Total Time Audio: 1.41s  Proc: 6.40s 4.54 X real time
15:50:41.235 INFO memoryTracker           Mem  Total: 1181.50 Mb  Free: 726.20 Mb
15:50:41.235 INFO memoryTracker           Used: This: 455.30 Mb  Avg: 455.30 Mb  Max: 455.30 Mb
and you want slurring
================word result==============5
{and, 0.999, [140:670]}
{<sil>, 0.999, [680:780]}
{you, 1.000, [790:890]}
{want, 1.000, [900:1080]}
{slurring, 1.000, [1090:1590]}

1 个答案:

答案 0 :(得分:0)

默认CMUSphinx模型经过美国英语数据培训,您的发音具有很强的口音。如果你想更好地识别你的口音,你可以按照教程

中的描述调整模型

http://cmusphinx.sourceforge.net/wiki/tutorialadapt

您将用于调整的数据越多,准确性就越高。