我尝试了min_silence_len和silence_thresh的多种组合,但它总是返回长度为1的块。无法理解,我缺少什么?
import speech_recognition as sr
from pydub import AudioSegment
from pydub.silence import split_on_silence
# obtain audio from the microphone
r = sr.Recognizer()
#r = sr.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")
with sr.Microphone() as source:
print("Please wait. Calibrating microphone...")
# listen for 5 seconds and create the ambient noise energy level
r.adjust_for_ambient_noise(source, duration=5)
print("Say something!")
audio = r.listen(source)
a = r.recognize_google(audio)
audio_chunks = split_on_silence(a, min_silence_len=30, silence_thresh=-5)
print(len(audio_chunks))
例如: 当我在麦克风上说“这段代码有什么问题”时。我期待完整的句子被分成块,因为audio_chunks应该是一个单词数组: 什么||是||错了||与||这个||代码
实际结果: audio_chunk长度= 1 audio_chunk = ['这个代码的内容是什么']