我正在从mp3语音文件中提取MFCC功能,但我确实希望保持源文件不变并且不添加任何新文件。我的处理过程包括以下步骤:
pydub
加载.mp3文件,消除静默并生成.wav数据scipy.io.wavfile.read()
python_speech_features
但是,eliminate_silence()
返回一个AudioSegment
对象,而scipy.io.wavfile.read()
接受一个.wav
文件名,因此我被迫临时将数据保存/导出为wave以确保之间的过渡。这一步很耗内存,因此我的问题是:如何避免导出wave文件的步骤?还是有解决方法?
这是我的代码。
import os
from pydub import AudioSegment
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc
from pydub.silence import split_on_silence
def eliminate_silence(input_path):
""" Eliminate silent chunks from original call recording """
# Import input wave file
sound = AudioSegment.from_mp3(input_path)
chunks = split_on_silence(sound,
# split on silences longer than 1000ms (1 sec)
min_silence_len=500,
# anything under -16 dBFS is considered silence
silence_thresh=-30,
# keep 200 ms of leading/trailing silence
keep_silence=100)
output_chunks = AudioSegment.empty()
for chunk in chunks: output_chunks += chunk
return output_chunks
silence_clear_data = eliminate_silence("file.mp3")
silence_clear_data.export("temp.wav", format="wav")
rate, audio = read("temp.wav")
os.remove("temp.wav")
# Extract MFCCs
mfcc_feature = mfcc(audio, rate, winlen = 0.025, winstep = 0.01, numcep = 15,
nfilt = 35, nfft = 512, appendEnergy = True)
mfcc_feature = preprocessing.scale(mfcc_feature)
答案 0 :(得分:1)
您需要AudioSegment.get_array_of_samples()之类的东西。 (您可能需要先从该数组构造一个numpy数组,然后再将其传递给mfcc。)
答案 1 :(得分:1)
我目前正在研究一个使用静音和mfcc系数进行音频剪切的项目,我要离开解决方案:
import pydub
import python_speech_features as p
import numpy as np
def generate_mfcc_without_silences(path):
#get audio and change frame rate to 16KHz
audio_file = pydub.AudioSegment.from_wav(path)
audio_file = audio_file.set_frame_rate(16000)
#cut audio using silences
chunks = pydub.silence.split_on_silence(audio_file, silence_thresh=audio_file.dBFS, min_silence_len=200)
mfccs = []
for chunk in chunks:
#compute mfcc from chunk array
np_chunk = np.frombuffer(chunk.get_array_of_samples(), dtype=np.int16)
mfccs.append(p.mfcc(np_chunk, samplerate=audio_file.frame_rate, numcep=26))
return mfccs
注意事项:
·我将音频更改为16KHz,但这是可选的
·我的值min_silence_len为200,因为我想尝试获得单个单词
使用我的职能的内容和您的要求,您可能需要的职能是:
import pydub
import python_speech_features as p
import numpy as np
from sklearn import preprocessing
def mfcc_from_audio_without_silences(path):
audio_file = pydub.AudioSegment.from_mp3(input_path)
chunks = pydub.silence.split_on_silence(audio_file, silence_thresh=-30, min_silence_len=500, keep_silence=100)
output_chunks = pydub.AudioSegment.empty()
for chunk in chunks:
output_chunks += chunk
output_chunks = np.frombuffer(output_chunks.get_array_of_samples(), dtype=np.int16)
mfcc_feature = p.mfcc(output_chunks, samplerate=audio_file.frame_rate, numcep=15, nfilt = 35)
return preprocessing.scale(mfcc_feature)