Audiosegment对象和wave文件/数据之间的过渡

时间:2018-08-13 15:42:03

标签: python scipy scikit-learn mfcc pydub

我正在从mp3语音文件中提取MFCC功能,但我确实希望保持源文件不变并且不添加任何新文件。我的处理过程包括以下步骤:

  • 使用pydub加载.mp3文件,消除静默并生成.wav数据
  • 使用scipy.io.wavfile.read()
  • 读取音频数据和速率
  • 使用python_speech_features
  • 提取功能

但是,eliminate_silence()返回一个AudioSegment对象,而scipy.io.wavfile.read()接受一个.wav文件名,因此我被迫临时将数据保存/导出为wave以确保之间的过渡。这一步很耗内存,因此我的问题是:如何避免导出wave文件的步骤?还是有解决方法?

这是我的代码。

import os
from pydub import AudioSegment
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc
from pydub.silence import split_on_silence

def eliminate_silence(input_path):
    """ Eliminate silent chunks from original call recording """
    # Import input wave file
    sound  = AudioSegment.from_mp3(input_path)
    chunks = split_on_silence(sound,
                              # split on silences longer than 1000ms (1 sec)
                              min_silence_len=500,
                              # anything under -16 dBFS is considered silence
                              silence_thresh=-30,
                              # keep 200 ms of leading/trailing silence
                              keep_silence=100)

    output_chunks = AudioSegment.empty()
    for chunk in chunks: output_chunks += chunk
    return output_chunks


silence_clear_data = eliminate_silence("file.mp3")
silence_clear_data.export("temp.wav", format="wav")
rate, audio = read("temp.wav")
os.remove("temp.wav")

# Extract MFCCs
mfcc_feature = mfcc(audio, rate, winlen = 0.025, winstep = 0.01, numcep = 15,
                    nfilt = 35, nfft = 512, appendEnergy = True)
mfcc_feature = preprocessing.scale(mfcc_feature)

2 个答案:

答案 0 :(得分:1)

您需要AudioSegment.get_array_of_samples()之类的东西。 (您可能需要先从该数组构造一个numpy数组,然后再将其传递给mfcc。)

答案 1 :(得分:1)

我目前正在研究一个使用静音和mfcc系数进行音频剪切的项目,我要离开解决方案:

import pydub
import python_speech_features as p
import numpy as np

def generate_mfcc_without_silences(path):
    #get audio and change frame rate to 16KHz
    audio_file = pydub.AudioSegment.from_wav(path)
    audio_file = audio_file.set_frame_rate(16000)
    #cut audio using silences
    chunks = pydub.silence.split_on_silence(audio_file, silence_thresh=audio_file.dBFS, min_silence_len=200)
    mfccs = []
    for chunk in chunks:
        #compute mfcc from chunk array
        np_chunk = np.frombuffer(chunk.get_array_of_samples(), dtype=np.int16)
        mfccs.append(p.mfcc(np_chunk, samplerate=audio_file.frame_rate, numcep=26))
    return mfccs

注意事项:

·我将音频更改为16KHz,但这是可选的

·我的值min_silence_len为200,因为我想尝试获得单个单词

使用我的职能的内容和您的要求,您可能需要的职能是:

import pydub
import python_speech_features as p
import numpy as np
from sklearn import preprocessing

def mfcc_from_audio_without_silences(path):
    audio_file  = pydub.AudioSegment.from_mp3(input_path)
    chunks = pydub.silence.split_on_silence(audio_file, silence_thresh=-30, min_silence_len=500, keep_silence=100)

    output_chunks = pydub.AudioSegment.empty()
    for chunk in chunks:
        output_chunks += chunk

    output_chunks = np.frombuffer(output_chunks.get_array_of_samples(), dtype=np.int16)
    mfcc_feature = p.mfcc(output_chunks, samplerate=audio_file.frame_rate, numcep=15, nfilt = 35)
    return preprocessing.scale(mfcc_feature)