Question

我尝试使用Bing ASR服务SpeechRecognition package使用脚本转录the audio of this clip

#!/usr/bin/env python3

"""Recognize speech using Microsoft Bing Voice Recognition."""

import speech_recognition as sr

from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "input.wav")

# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # read the entire audio file


# Microsoft Bing Voice Recognition API uses keys which are
# 32-character lowercase hexadecimal strings
BING_KEY = "FOOBAR - insert your key here"
try:
    print("Microsoft Bing Voice Recognition thinks you said:\n\n" +
          r.recognize_bing(audio, key=BING_KEY, language="de-DE"))
except sr.UnknownValueError:
    print("Microsoft Bing Voice Recognition could not understand audio")
except sr.RequestError as e:
    print(("Could not request results from Microsoft Bing Voice Recognition "
           "service; {0}").format(e))

输出：

Microsoft Bing Voice Recognition thinks you said:

Reaser Was ist haben sie Lust mit dem Kino zu kommen war schon dass ich könnte den Film gar nicht folgen

显然，它正在运行，但它不会转录完整的文件。为什么？如何将其转录成完整的文件？

Answer 1

问题是SpeechRecognition包使用REST接口而不是WebSocket接口。 REST界面限制为15秒的音频。

来源：https://docs.microsoft.com/de-de/azure/cognitive-services/speech/home

为什么使用Bing服务的SpeechRecognition只识别第一句话？

1 个答案: