使Python语音识别更快

时间:2018-05-27 00:31:03

标签: python speech-recognition google-speech-api dictation

我一直在使用Google语音识别功能。这是我的代码:

import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
   print("Say something!")
   audio = r.listen(source)
   print(r.recognize_google(audio))

尽管识别非常准确,但在吐出识别出来的文本之前大约需要4-5秒。由于我正在创建一个语音助手,我想修改上面的代码,以使语音识别更快。

我们有什么方法可以将这个数字降低到1-2秒左右?如果可能的话,我试图像Siri和Ok Google这样的服务一样快速地进行识别。

我是python的新手,所以如果对我的问题有一个简单的答案我很抱歉。

2 个答案:

答案 0 :(得分:2)

您可以使用其他语音识别程序。例如,您可以在IBM设置一个帐户以使用其Watson Speech To Text。 如有可能,请尝试使用其websocket界面,因为这样,当您仍在讲话时,它将主动记录您的发言。

一个示例(不使用websockets)将是:

import speech_recognition as sr

# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Adjusting for background noise. One second")
    r.adjust_for_ambient_noise(source)
    print("Say something!")
    audio = r.listen(source)

IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE"  # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE"  # IBM Speech to Text passwords are mixed-case alphanumeric strings
try:
    print("IBM Speech to Text thinks you said " + r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD))
except sr.UnknownValueError:
    print("IBM Speech to Text could not understand audio")
except sr.RequestError as e:
    print("Could not request results from IBM Speech to Text service; {0}".format(e))

您也可以尝试使用Pocketsphinx,但就我个人而言,我并没有特别好的经验。它是离线的(加号),但是对我而言并不是特别准确。您可能需要调整一些检测设置并消除一些背景噪音。我相信,还有一种训练方法可以将其修改为您的声音,但看起来并不简单。

一些有用的链接:

Speech recognition

Microphone recognition example

IBM Watson Speech to Text

祝你好运。一旦语音识别正常工作,它就会非常有用且有益!

答案 1 :(得分:-1)

使用正确的输入通道并进行调整以获得最佳效果:

def speech_to_text():

    required=-1
    for index, name in enumerate(sr.Microphone.list_microphone_names()):
        if "pulse" in name:
            required= index
    r = sr.Recognizer()
    with sr.Microphone(device_index=required) as source:
        r.adjust_for_ambient_noise(source)
        print("Say something!")
        audio = r.listen(source, phrase_time_limit=4)
    try:
        input = r.recognize_google(audio)
        print("You said: " + input)
        return str(input)
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))