Google语音识别功能只能用几秒钟音频。所以我将音频文件拆分为块。这是分裂音频的类。
class Split_audio():
def __init__(self):
"""
Constructor
"""
def create_folder(self,audio):
"""
Create folder for chunks
"""
#name of the folder: exemple audio file's name = test.wav ==> folder's name = test
pos=audio.get_nameAudioFile()
pos=pos.rfind('.')
folder=audio.get_nameAudioFile()[0:pos]
#if folder exist overwrite
if os.path.exists(folder):
shutil.rmtree(folder)
#create folder
os.makedirs(folder)
return folder
def split(self,audio,silence_thresh=None, min_silence_len=500):
"""
Split audio file on silence
"""
sound_file = AudioSegment.from_wav(audio.get_nameAudioFile())
if silence_thresh==None:
silence_thresh=int(sound_file.dBFS)-19
audio_chunks = split_on_silence(sound_file, silence_thresh=silence_thresh, min_silence_len=min_silence_len)
return audio_chunks
def export(self,audio,path_folder=None):
"""
Export chunks on wav's file
"""
audio_chunks=self.split(audio)
if path_folder==None:
path_folder=self.create_folder(audio)
for i, chunk in enumerate(audio_chunks):
out_file = "chunk{0}.wav".format(i)
path="%s/%s" %(path_folder,out_file)
chunk.export(path, format="wav")
我的结论是google_recognize输出的质量取决于silence_thresh和min_silence。在对3种不同的音频进行测试之后,我将值设置为silent_thresh =音频的dbfs - 19和min_silence = 500ms。 1个月后,我重新测试了相同音频的代码。哎呀我的成绩单与第一篇完全不同。 这是两个结果: First result second result。 有什么建议吗?