我试图了解一个SpeechPy-Python库,用于从音频文件中提取功能。文档中有一个测试代码:
import scipy.io.wavfile as wav
import numpy as np
import speechpy
import os
file_name =
os.path.join(os.path.dirname(os.path.abspath(__file__)),'Alesis-
Sanctuary-QCard-AcoustcBas-C2.wav')
fs, signal = wav.read(file_name)
signal = signal[:,0]
# Example of pre-emphasizing.
signal_preemphasized = speechpy.processing.preemphasis(signal, cof=0.98)
# Example of staching frames
frames = speechpy.processing.stack_frames(signal, sampling_frequency=fs,
frame_length=0.020, frame_stride=0.01, filter=lambda x: np.ones((x,)),
zero_padding=True)
# Example of extracting power spectrum
power_spectrum = speechpy.processing.power_spectrum(frames,
fft_points=512)
print('power spectrum shape=', power_spectrum.shape)
############# Extract MFCC features #############
mfcc = speechpy.feature.mfcc(signal, sampling_frequency=fs,
frame_length=0.020, frame_stride=0.01,
num_filters=40, fft_length=512, low_frequency=0, high_frequency=None)
mfcc_cmvn =
speechpy.processing.cmvnw(mfcc,win_size=301,variance_normalization=True)
print('mfcc(mean + variance normalized) feature shape=',
mfcc_cmvn.shape)
mfcc_feature_cube = speechpy.feature.extract_derivative_feature(mfcc)
print('mfcc feature cube shape=', mfcc_feature_cube.shape)
############# Extract logenergy features #############
logenergy = speechpy.feature.lmfe(signal, sampling_frequency=fs,
frame_length=0.020, frame_stride=0.01,
num_filters=40, fft_length=512, low_frequency=0, high_frequency=None)
logenergy_feature_cube =
speechpy.feature.extract_derivative_feature(logenergy)
print('logenergy features=', logenergy.shape)
所以我的主要问题是:当它最终提取特征时,我如何知道从音频文件中提取出的确切特征是什么? 当我将音频文件放入此代码时,它将返回一个矩阵。但是,不是一个文件的所有功能都应该作为矢量返回吗?
如果有人帮助我理解这一点,将非常感谢。