我有一个使用speech_recognition包的Python脚本来识别语音并返回所说内容的文本。然而,转录延迟了几秒钟。是否有另一种方法来编写此脚本以返回每个单词,因为它被说出来了?我有另一个脚本来使用pysphinx包,但结果非常不准确。
安装依赖项:
pip install SpeechRecognition
pip install pocketsphinx
脚本1 - 语音到文本的延迟:
import speech_recognition as sr
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Please wait. Calibrating microphone...")
# listen for 5 seconds and create the ambient noise energy level
r.adjust_for_ambient_noise(source, duration=5)
print("Say something!")
audio = r.listen(source)
# recognize speech using Sphinx
try:
print("Sphinx thinks you said '" + r.recognize_sphinx(audio) + "'")
except sr.UnknownValueError:
print("Sphinx could not understand audio")
except sr.RequestError as e:
print("Sphinx error; {0}".format(e))
脚本2 - 即时不准确的语音转文本:
import os
from pocketsphinx import LiveSpeech, get_model_path
model_path = get_model_path()
speech = LiveSpeech(
verbose=False,
sampling_rate=16000,
buffer_size=2048,
no_search=False,
full_utt=False,
hmm=os.path.join(model_path, 'en-us'),
lm=os.path.join(model_path, 'en-us.lm.bin'),
dic=os.path.join(model_path, 'cmudict-en-us.dict')
)
for phrase in speech: print(phrase)
答案 0 :(得分:1)
如果碰巧具有启用CUDA的GPU,则可以尝试Mozilla的DeepSpeech GPU库。它们还具有CPU版本,以防您没有启用CUDA的GPU。 CPU使用DeepSpeech以1.3倍的时间记录音频文件,而在GPU上,速度为0.3倍,即它以0.33秒的时间记录1秒的音频文件。 快速入门:
# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/
source $HOME/tmp/deepspeech-gpu-venv/bin/activate
# Install DeepSpeech CUDA enabled package
pip3 install deepspeech-gpu
# Transcribe an audio file.
deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech-
0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio audio/2830-
3980-0043.wav
一些重要说明-Deepspeech-gpu具有一些依赖项,例如tensorflow,CUDA,cuDNN等。因此,请查看其github存储库以获取更多详细信息-https://github.com/mozilla/DeepSpeech