我正在尝试使用sceech_recognition将记录的音频数据转换为文本,到目前为止,我已经能够使用.wav
成功地转换记录的音频。我正在使用pyaudio
将音频录制到.wav
文件中。由于pyaudio
创建了音频流,而不是将其写入文件,然后在Speech_recognition中使用该文件,因此我想直接转换音频流。下面是应获取音频流然后使用sr.AudioData
函数并转换音频的实现。
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = get_nonexistant_path("voice.wav")
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data) #appending all the intermediate streams to a list
print("* done recording")
r = sr.Recognizer()
audio_bytes = b''.join(frames) #converting list to a byte object
audio_source = sr.AudioData(audio_bytes, RATE, CHANNELS)
try:
print('Trying to convert')
text = r.recognize_google(audio_data=audio_source, language='en-US', show_all=True)
print(text)
except sr.UnknownValueError:
print("Could not understand")
stream.stop_stream()
stream.close()
p.terminate()
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
现在,每次运行此代码时,我都会得到一个空列表作为输出。
* recording
* done recording
Trying to convert
[]
由于我还将音频流保存到.wav
文件中以进行调试,因此当我使用相同的.wav
文件时,我得到了正确的翻译。
You said "Google speech"
{'alternative': [{'transcript': 'Google speech', 'confidence': 0.98762912}], 'final': True}
有人可以告诉我这是怎么回事。