我打算使用sphinx4将语音转换为文本。我一直在阅读一些教程和评论以提高准确性,我正在使用以下改编:
通用声学模型和通用语言模型的使用是由于不知道将要说的是什么词。
我正在使用以下代码:
import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.LiveSpeechRecognizer;
import edu.cmu.sphinx.api.SpeechResult;
public class Example {
private static final String ACOUSTIC_MODEL =
"file:/Users/Jimo/Testing/models/acoustic/acoustic_model_us";
private static final String DICTIONARY_PATH =
"file:/Users/Jimo/Testing/models/acoustic/acoustic_model_us/dict/cmudict.0.6d";
private static final String LANGUAGE_MODEL =
"file:/Users/Jimo/Testing/models/language/en-us.lm.dmp";
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
configuration.setAcousticModelPath(ACOUSTIC_MODEL);
configuration.setDictionaryPath(DICTIONARY_PATH);
configuration.setLanguageModelPath(LANGUAGE_MODEL);
LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);
// Start recognition process pruning previously cached data.
recognizer.startRecognition(true);
SpeechResult result = recognizer.getResult();
System.out.println(result.getHypothesis());
// Pause recognition process. It can be resumed then with startRecognition(false).
recognizer.stopRecognition();
}
}
响应很慢,可能是由于语言模型的大小,我几乎无法获得所需的结果。例如,如果我说“你好”,输出将是“哦。”
我做错了什么?我知道为了提高准确性,我必须使用特定的模型语言,但这种方式在使用此应用程序时并不实用。