我尝试使用Bing ASR服务SpeechRecognition package使用脚本转录the audio of this clip
#!/usr/bin/env python3
"""Recognize speech using Microsoft Bing Voice Recognition."""
import speech_recognition as sr
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "input.wav")
# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # read the entire audio file
# Microsoft Bing Voice Recognition API uses keys which are
# 32-character lowercase hexadecimal strings
BING_KEY = "FOOBAR - insert your key here"
try:
print("Microsoft Bing Voice Recognition thinks you said:\n\n" +
r.recognize_bing(audio, key=BING_KEY, language="de-DE"))
except sr.UnknownValueError:
print("Microsoft Bing Voice Recognition could not understand audio")
except sr.RequestError as e:
print(("Could not request results from Microsoft Bing Voice Recognition "
"service; {0}").format(e))
输出:
Microsoft Bing Voice Recognition thinks you said:
Reaser Was ist haben sie Lust mit dem Kino zu kommen war schon dass ich könnte den Film gar nicht folgen
显然,它正在运行,但它不会转录完整的文件。为什么?如何将其转录成完整的文件?
答案 0 :(得分:0)
问题是SpeechRecognition包使用REST接口而不是WebSocket接口。 REST界面限制为15秒的音频。
来源:https://docs.microsoft.com/de-de/azure/cognitive-services/speech/home