我一直在尝试使用python语音识别库https://pypi.python.org/pypi/SpeechRecognition/
阅读BBC出货预测的下载版本。将这些文件从现场广播剪辑到iplayer显然是自动化的并且不是非常准确 - 因此通常在预测本身开始之前会有一些音频 - 预告片或新闻的结尾。我不需要那么准确,但我想让语音识别能够识别“和现在的运输预测”这一短语(或者只是'运送'会实际发生)并从那里剪切文件。
到目前为止,我的代码(通过示例获取)转录和预测的音频文件,并使用公式(基于每分钟200字)来预测运输单词的来源,但事实并非如此。
有没有办法获得pocketphinx本身为该单词检测到的实际“帧”或第二次发作?我在文档中找不到任何内容。任何想法?
import speech_recognition as sr
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "test_short2.wav")
# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # read the entire audio file
# recognize speech using Sphinx
try:
print "Sphinx thinks you said "
returnedSpeech = str(r.recognize_sphinx(audio))
wordsList = returnedSpeech.split()
print returnedSpeech
print "predicted loacation of start ", float(wordsList.index("shipping")) * 0.3
except sr.UnknownValueError:
print("Sphinx could not understand audio")
except sr.RequestError as e:
print("Sphinx error; {0}".format(e))
答案 0 :(得分:1)
你需要直接使用pocketsphinx API来做这些事情。强烈建议您阅读pocketsphinx documentation on keyword spotting。
您可以找到example中所示的关键短语:
config = Decoder.default_config()
config.set_string('-hmm', os.path.join(modeldir, 'en-us/en-us'))
config.set_string('-dict', os.path.join(modeldir, 'en-us/cmudict-en-us.dict'))
config.set_string('-keyphrase', 'shipping forecast')
config.set_float('-kws_threshold', 1e-30)
stream = open(os.path.join(datadir, "test_short2.wav"), "rb")
decoder = Decoder(config)
decoder.start_utt()
while True:
buf = stream.read(1024)
if buf:
decoder.process_raw(buf, False, False)
else:
break
if decoder.hyp() != None:
print ([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()])
print ("Detected keyphrase, restarting search")
decoder.end_utt()
decoder.start_utt()