我正在尝试找出如何在python中将Azure语音设置为文本SDK API以便识别文件的过程。
我在这里从python quickstart尝试了这段代码:
speech_config = speechsdk.SpeechConfig(subscription=cls.speech_key, region=cls.service_region )
audio_config = speechsdk.audio.AudioConfig(filename=file_name)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
result = speech_recognizer.recognize_once()
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
response_str = result.text
# print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
response_str = result.no_match_details
print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = result.cancellation_details
response_str = cancellation_details.reason
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
response_str = cancellation_details.error_details
print("Error details: {}".format(cancellation_details.error_details))
所有工作,除了仅识别出前15秒的事实。但是,此页面:https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-to-text 说如果我使用SDK api(而不是REST),则可以转录更长的语音。
我的问题是:
任何想法都会受到赞赏