我目前正致力于语音识别深度学习项目。
我需要使用移位声音文件来扩充我当前的数据或拉伸它
但问题是在增强过程中形状正在发生变化
y, sr = librosa.load(os.path.join(train_data_path, label, fname))
librosa.output.write_wav('./input/train_test2/'+label+'/10000'+fname ,y,sr)
虽然我没有改变任何东西,但它改变了形状。
假设我原来的形状有(99,81,1),但在我改变它后改为(77,81,1)或其他东西
但问题是当我使用keras进行分类时
inp = Input(shape=input_shape)
norm_inp = BatchNormalization()(inp)
img_1 = Convolution2D(8, kernel_size=2, activation=activations.relu)(norm_inp)
img_1 = Convolution2D(8, kernel_size=2, activation=activations.relu)(img_1)
img_1 = MaxPooling2D(pool_size=(2, 2))(img_1)
img_1 = Dropout(rate=0.2)(img_1)
不同的input_shape不适用于keras。 在修改wav文件后,我甚至不确定是否可以保留原始形状
=========================================== 我执行log_spectrogram后
的形状发生了变化def log_specgram(audio, sample_rate, window_size=20,
step_size=10, eps=1e-10):
nperseg = int(round(window_size * sample_rate / 1e3))
noverlap = int(round(step_size * sample_rate / 1e3))
freqs, times, spec = signal.spectrogram(audio,
fs=sample_rate,
window='hann',
nperseg=nperseg,
noverlap=noverlap,
detrend=False)
return freqs, times, np.log(spec.T.astype(np.float32) + eps)
这个np.log(spec.T.astype(np.float32)+ eps)的形状不同
=============================================== ============== 原始文件
sample_rate, samples = wavfile.read('./input/train/audio/eight/012c8314_nohash_1.wav')
print(sample_rate , sample_rate_test)
new_sample_rate = 8000
resampled = signal.resample(samples, int(new_sample_rate / sample_rate * samples.shape[0]))
print(resampled2.shape)
_, _, specgram = log_specgram(resampled, sample_rate=new_sample_rate)
print("specgramshape->", specgram.shape)
S = librosa.feature.melspectrogram(y =samples, sr =sample_rate, n_mels=128, fmax = 8000 )
print("S->", S.shape)
librosa.display.specshow(librosa.power_to_db(S, ref=np.max), y_axis = 'mel' , fmax = 8000, x_axis='time')
16000 22050
(5804,)
specgramshape-> (99, 81)
S-> (128, 32)
=============================================== ================
使用后
y = librosa.resample(y,sr,16000)
librosa.output.write_wav('./input/train_test/'+label+'/10000'+fname ,y,sr)
(16000,) (16000,)
(5804,)
specgramshape-> (71, 81)
S-> (128, 32)
=============================================== ======================