我尝试使用以下代码使用Sphinx从音频中获取单词结果,但是准确度非常低,我可以知道如何改进它吗? 实际上只有前三个单词是正确的,左边3个单词无法正确检测。
我已使用此语音https://www.dropbox.com/s/33fzzf5s59wbi7e/test2.wav?dl=0
对其进行了测试 Configuration configuration = new Configuration();
// Set path to acoustic model.
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// Set path to dictionary.
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
// Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
StreamSpeechRecognizer recognizer;
try {
recognizer = new StreamSpeechRecognizer(configuration);
recognizer.startRecognition(new FileInputStream("1.wav"));
SpeechResult result = recognizer.getResult();
recognizer.stopRecognition();
// Print utterance string without filler words.
System.out.println(result.getHypothesis());
System.out.println("================word result=============="+result.getWords().size());
// Get individual words and their times.
for (WordResult r : result.getWords()) {
System.out.println(r);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
以下是结果的输出:
15:50:34.677 INFO speedTracker # ----------------------------- Timers----------------------------------------
15:50:34.677 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
15:50:34.678 INFO speedTracker Compile 1 0.9860s 0.9860s 0.9860s 0.9860s 0.9860s
15:50:34.678 INFO speedTracker Load LM 1 0.8220s 0.8220s 0.8220s 0.8220s 0.8220s
15:50:34.678 INFO speedTracker Load Dictionary 1 0.1020s 0.1020s 0.1020s 0.1020s 0.1020s
15:50:34.678 INFO speedTracker Load AM 1 1.5760s 1.5760s 1.5760s 1.5760s 1.5760s
15:50:41.150 INFO speedTracker This Time Audio: 1.41s Proc: 6.40s Speed: 4.54 X real time
15:50:41.152 INFO speedTracker Total Time Audio: 1.41s Proc: 6.40s 4.54 X real time
15:50:41.152 INFO memoryTracker Mem Total: 1181.50 Mb Free: 726.20 Mb
15:50:41.152 INFO memoryTracker Used: This: 455.30 Mb Avg: 455.30 Mb Max: 455.30 Mb
15:50:41.152 INFO trieNgramModel LM Cache Size: 5646 Hits: 1215433 Misses: 5646
15:50:41.234 INFO speedTracker # ----------------------------- Timers----------------------------------------
15:50:41.234 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
15:50:41.234 INFO speedTracker Compile 1 0.9860s 0.9860s 0.9860s 0.9860s 0.9860s
15:50:41.234 INFO speedTracker Load LM 1 0.8220s 0.8220s 0.8220s 0.8220s 0.8220s
15:50:41.234 INFO speedTracker Load Dictionary 1 0.1020s 0.1020s 0.1020s 0.1020s 0.1020s
15:50:41.234 INFO speedTracker Score 318 0.0000s 0.0000s 0.0630s 0.0026s 0.8240s
15:50:41.235 INFO speedTracker Prune 1111 0.0000s 0.0000s 0.0070s 0.0001s 0.1440s
15:50:41.235 INFO speedTracker Grow 1113 0.0000s 0.0000s 1.0780s 0.0049s 5.4790s
15:50:41.235 INFO speedTracker Frontend 161 0.0000s 0.0000s 0.0600s 0.0004s 0.0610s
15:50:41.235 INFO speedTracker Load AM 1 1.5760s 1.5760s 1.5760s 1.5760s 1.5760s
15:50:41.235 INFO speedTracker Total Time Audio: 1.41s Proc: 6.40s 4.54 X real time
15:50:41.235 INFO memoryTracker Mem Total: 1181.50 Mb Free: 726.20 Mb
15:50:41.235 INFO memoryTracker Used: This: 455.30 Mb Avg: 455.30 Mb Max: 455.30 Mb
and you want slurring
================word result==============5
{and, 0.999, [140:670]}
{<sil>, 0.999, [680:780]}
{you, 1.000, [790:890]}
{want, 1.000, [900:1080]}
{slurring, 1.000, [1090:1590]}
答案 0 :(得分:0)
默认CMUSphinx模型经过美国英语数据培训,您的发音具有很强的口音。如果你想更好地识别你的口音,你可以按照教程
中的描述调整模型http://cmusphinx.sourceforge.net/wiki/tutorialadapt
您将用于调整的数据越多,准确性就越高。