我查看了有关测量沉默的所有帖子,遗憾的是我无法找到解决问题的方法。
我有大约3000个音频文件。每个文件都是10秒长,是一个大声说话的人的录音。
我需要知道他们说这个词需要多长时间(=在说出这个词之前的沉默)。 我读过我可以用公式audioop.rms(片段,宽度)来做,但我找不到如何使用它的说明。
新错误:
void@control:~/Documents$ python audio.py
Traceback (most recent call last):
File "audio.py", line 36, in <module>
leading_silences = {a: get_silence(a, threshold) for a in audio_files}
File "audio.py", line 36, in <dictcomp>
leading_silences = {a: get_silence(a, threshold) for a in audio_files}
File "audio.py", line 7, in get_silence
song = AudioSegment.from_wav(audio)
File "/usr/local/lib/python2.7/dist-packages/pydub/audio_segment.py", line 471, in from_wav
return cls.from_file(file, 'wav')
File "/usr/local/lib/python2.7/dist-packages/pydub/audio_segment.py", line 387, in from_file
file = _fd_or_path_or_tempfile(file, 'rb', tempfile=False)
File "/usr/local/lib/python2.7/dist-packages/pydub/utils.py", line 59, in _fd_or_path_or_tempfile
fd = open(fd, mode=mode)
IOError: [Errno 2] No such file or directory: 'silbato.wav'
答案 0 :(得分:1)
您可以使用@jiarro的可爱pydub库:
from pydub import AudioSegment
from os import listdir
from os.path import isfile, join
def get_silence(audio, threshold, interval):
"get length of silence in seconds from a wav file"
# swap out pydub import for other types of audio
song = AudioSegment.from_wav(audio)
# break into chunks
chunks = [song[i:i+interval] for i in range(0, len(song), interval)]
# find number of chunks with dBFS below threshold
silent_blocks = 0
for c in chunks:
if c.dBFS == float('-inf') or c.dBFS < threshold:
silent_blocks += 1
else:
break
# convert blocks into seconds
return round(silent_blocks * (interval/1000), 3)
# get files in a directory
audio_path = 'path/to/directory'
audio_files = [i for i in listdir(audio_path) if isfile(join(audio_path, i))]
threshold = -80 # tweak based on signal-to-noise ratio
interval = 1 # ms, increase to speed up
leading_silences = {a: get_silence(join(audio_path, a),
threshold, interval) for a in audio_files}
# to get tab-separated values:
for name, leading_silence in leading_silences.items():
print(''.join([name, '\t', str(leading_silence)]))