我正在尝试使用CMU Sphinx语音识别器来识别我在WPF中录制的一些语音文件:
以下是我编译的示例代码:
package com.example;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.SpeechResult;
import edu.cmu.sphinx.api.StreamSpeechRecognizer;
public class TranscriberDemo {
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
configuration
.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
configuration
.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
configuration
.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(
configuration);
InputStream stream = new FileInputStream(new File("test.wav"));
recognizer.startRecognition(stream);
SpeechResult result;
while ((result = recognizer.getResult()) != null) {
System.out.format("Hypothesis: %s\n", result.getHypothesis());
}
recognizer.stopRecognition();
}
}
以下是代码如何生成wave文件:
... in a WPF Window code behind ...
[DllImport("winmm.dll", EntryPoint = "mciSendStringA", CharSet = CharSet.Ansi, SetLastError = true, ExactSpelling = true)]
private static extern int mciSendString(string lpstrCommand, string lpstrReturnString, int uReturnLength, int hwndCallback);
private void OnRecButtonClicked(object sender, RoutedEventArgs e)
{
mciSendString("close MediaFile ", "", 0, 0);
mciSendString("open new Type waveaudio Alias recsound", "", 0, 0);
mciSendString("set recsound channels 1", "", 0, 0);
mciSendString("set recsound samplespersec 11025", "", 0, 0);
mciSendString("set recsound alignment 4", "", 0, 0);
mciSendString("set recsound bitspersample 16", "", 0, 0);
mciSendString("record recsound", "", 0, 0);
txtStatus.Text = "Recording...";
}
private void OnStopButtonClicked(object sender, RoutedEventArgs e)
{
mciSendString("save recsound test.wav", "", 0, 0);
mciSendString("close recsound ", "", 0, 0);
txtStatus.Text = "Stopped...";
}
... more WPF ...
无论我说什么,result.getHypothesis()似乎总是给出空字符串。如何开始调试设置出现的问题?编码有问题吗?随着语音质量?还是培训不足? (我使用的模型随附下载)我不是母语为英语的人,所以我的声音不标准,但我希望识别器可以提供一些输出。
C:\Users\mike\Downloads\Sample>java -cp .;sphinx4-core-5prealpha-SNAPSHOT.jar;sphinx4-data-5prealpha-SNAPSHOT.jar com.example.TranscriberDemo
00:21:36.660 INFO unitManager CI Unit: *+NSN+
00:21:36.660 INFO unitManager CI Unit: *+SPN+
00:21:36.676 INFO unitManager CI Unit: AA
00:21:36.676 INFO unitManager CI Unit: AE
00:21:36.676 INFO unitManager CI Unit: AH
00:21:36.676 INFO unitManager CI Unit: AO
00:21:36.676 INFO unitManager CI Unit: AW
00:21:36.691 INFO unitManager CI Unit: AY
00:21:36.691 INFO unitManager CI Unit: B
00:21:36.691 INFO unitManager CI Unit: CH
00:21:36.707 INFO unitManager CI Unit: D
00:21:36.707 INFO unitManager CI Unit: DH
00:21:36.707 INFO unitManager CI Unit: EH
00:21:36.707 INFO unitManager CI Unit: ER
00:21:36.707 INFO unitManager CI Unit: EY
00:21:36.723 INFO unitManager CI Unit: F
00:21:36.723 INFO unitManager CI Unit: G
00:21:36.723 INFO unitManager CI Unit: HH
00:21:36.723 INFO unitManager CI Unit: IH
00:21:36.723 INFO unitManager CI Unit: IY
00:21:36.738 INFO unitManager CI Unit: JH
00:21:36.738 INFO unitManager CI Unit: K
00:21:36.738 INFO unitManager CI Unit: L
00:21:36.738 INFO unitManager CI Unit: M
00:21:36.738 INFO unitManager CI Unit: N
00:21:36.754 INFO unitManager CI Unit: NG
00:21:36.754 INFO unitManager CI Unit: OW
00:21:36.754 INFO unitManager CI Unit: OY
00:21:36.754 INFO unitManager CI Unit: P
00:21:36.754 INFO unitManager CI Unit: R
00:21:36.754 INFO unitManager CI Unit: S
00:21:36.769 INFO unitManager CI Unit: SH
00:21:36.769 INFO unitManager CI Unit: T
00:21:36.769 INFO unitManager CI Unit: TH
00:21:36.769 INFO unitManager CI Unit: UH
00:21:36.769 INFO unitManager CI Unit: UW
00:21:36.785 INFO unitManager CI Unit: V
00:21:36.785 INFO unitManager CI Unit: W
00:21:36.785 INFO unitManager CI Unit: Y
00:21:36.785 INFO unitManager CI Unit: Z
00:21:36.785 INFO unitManager CI Unit: ZH
00:21:37.568 INFO autoCepstrum Cepstrum component auto-configured as follows: autoCepstrum {MelFrequencyFilterBank, Denoise, DiscreteCosineTransform2, Lifter}
00:21:37.584 INFO dictionary Loading dictionary from: jar:file:/C:/Users/mike/Downloads/Sample/sphinx4-data-5prealpha-SNAPSHOT.jar!/edu/cmu/sphinx/model
s/en-us/cmudict-en-us.dict
00:21:37.756 INFO dictionary Loading filler dictionary from: jar:file:/C:/Users/mike/Downloads/Sample/sphinx4-data-5prealpha-SNAPSHOT.jar!/edu/cmu/sphin
x/models/en-us/en-us/noisedict
00:21:37.756 INFO acousticModelLoader Loading tied-state acoustic model from: jar:file:/C:/Users/mike/Downloads/Sample/sphinx4-data-5prealpha-SNAPSHOT.jar!/edu/c
mu/sphinx/models/en-us/en-us
00:21:37.756 INFO acousticModelLoader Pool means Entries: 16128
00:21:37.756 INFO acousticModelLoader Pool variances Entries: 16128
00:21:37.756 INFO acousticModelLoader Pool transition_matrices Entries: 42
00:21:37.756 INFO acousticModelLoader Pool senones Entries: 5126
00:21:37.771 INFO acousticModelLoader Gaussian weights: mixture_weights. Entries: 15378
00:21:37.771 INFO acousticModelLoader Pool senones Entries: 5126
00:21:37.771 INFO acousticModelLoader Context Independent Unit Entries: 42
00:21:37.771 INFO acousticModelLoader HMM Manager: 137095 hmms
00:21:37.787 INFO acousticModel CompositeSenoneSequences: 0
00:21:37.787 INFO trieNgramModel Loading n-gram language model from: jar:file:/C:/Users/mike/Downloads/Sample/sphinx4-data-5prealpha-SNAPSHOT.jar!/edu/cmu/s
phinx/models/en-us/en-us.lm.bin
00:21:41.227 INFO lexTreeLinguist Max CI Units 43
00:21:41.227 INFO lexTreeLinguist Unit table size 79507
00:21:41.227 INFO speedTracker # ----------------------------- Timers----------------------------------------
00:21:41.227 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
00:21:41.242 INFO speedTracker Load AM 1 3.2610s 3.2610s 3.2610s 3.2610s 3.2610s
00:21:41.242 INFO speedTracker Load LM 1 1.6110s 1.6110s 1.6110s 1.6110s 1.6110s
00:21:41.242 INFO speedTracker Compile 1 1.8290s 1.8290s 1.8290s 1.8290s 1.8290s
00:21:41.242 INFO speedTracker Load Dictionary 1 0.1720s 0.1720s 0.1720s 0.1720s 0.1720s
00:21:41.289 INFO speedTracker This Time Audio: 1.03s Proc: 0.01s Speed: 0.01 X real time
00:21:41.289 INFO speedTracker Total Time Audio: 1.03s Proc: 0.01s 0.01 X real time
00:21:41.289 INFO memoryTracker Mem Total: 619.00 Mb Free: 362.60 Mb
00:21:41.289 INFO memoryTracker Used: This: 256.40 Mb Avg: 256.40 Mb Max: 256.40 Mb
00:21:41.289 INFO trieNgramModel LM Cache Size: 0 Hits: 0 Misses: 0
Hypothesis:
00:21:41.321 INFO trieNgramModel LM Cache Size: 0 Hits: 0 Misses: 0
00:21:41.321 INFO speedTracker # ----------------------------- Timers----------------------------------------
00:21:41.321 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
00:21:41.321 INFO speedTracker Load AM 1 3.2610s 3.2610s 3.2610s 3.2610s 3.2610s
00:21:41.321 INFO speedTracker Score 4 0.0160s 0.0000s 0.0160s 0.0080s 0.0320s
00:21:41.321 INFO speedTracker Prune 10 0.0000s 0.0000s 0.0000s 0.0000s 0.0000s
00:21:41.336 INFO speedTracker Grow 14 0.0000s 0.0000s 0.0150s 0.0011s 0.0150s
00:21:41.336 INFO speedTracker Load LM 1 1.6110s 1.6110s 1.6110s 1.6110s 1.6110s
00:21:41.336 INFO speedTracker Compile 1 1.8290s 1.8290s 1.8290s 1.8290s 1.8290s
00:21:41.336 INFO speedTracker Frontend 4 0.0160s 0.0000s 0.0160s 0.0080s 0.0320s
00:21:41.352 INFO speedTracker Load Dictionary 1 0.1720s 0.1720s 0.1720s 0.1720s 0.1720s
00:21:41.352 INFO speedTracker Total Time Audio: 1.03s Proc: 0.01s 0.01 X real time
00:21:41.352 INFO memoryTracker Mem Total: 619.00 Mb Free: 362.60 Mb
00:21:41.352 INFO memoryTracker Used: This: 256.40 Mb Avg: 256.40 Mb Max: 256.40 Mb
非常感谢您的帮助!