Question

目前，以下命令启动pocketsphinx并等待音量从麦克风击中特定阈值，开始录音，当音量降至阈值以下时将开始处理录制的音频并输出hello如果单词被发现了。

pocketsphinx_continuous -inmic yes -keyphrase "hello" -kws_threshold 1e-30

由于环境可能有点嘈杂，等待该卷阈值下降可能需要比预期更长的时间。有没有办法让Pocketsphinx在不需要等待沉默的情况下输出可识别的单词？

Answer 1

总的来说，如果你有明显的噪音，最好用硬件以某种方式取消它。具有源分离，定向光束麦克风等的麦克风阵列应该可以帮助您显着降低噪音。依靠pocketsphinx来处理噪音并不是一个好主意，它不是为此而设计的。

如果你想立即对发现做出反应，你最好通过API使用pocketsphinx，而不是使用pocketsphinx_continuous，这个Python中的简单例子可以做你想要的例子：

import sys, os, pyaudio
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *

modeldir = "../../../model"

# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', os.path.join(modeldir, 'en-us/en-us'))
config.set_string('-dict', os.path.join(modeldir, 'en-us/cmudict-en-us.dict'))
config.set_string('-keyphrase', 'forward')
config.set_float('-kws_threshold', 1e+20)

p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
stream.start_stream()

# Process audio chunk by chunk. On keyphrase detected perform action and restart search
decoder = Decoder(config)
decoder.start_utt()
while True:
    buf = stream.read(1024)
    if buf:
         decoder.process_raw(buf, False, False)
    else:
         break
    if decoder.hyp() != None:
        print ([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()])
        print ("Detected keyphrase, restarting search")
        decoder.end_utt()
        decoder.start_utt()

Pocketsphinx在单词检测上写入控制台而不是等待静音

1 个答案: