我正在一个项目中,该项目将人声部分与音频隔离开来。我使用的是DSD100数据集,但为了进行测试,我使用的DSD100subset dataset仅使用混音和人声。我将这项工作基于此article
首先,我处理音频以提取频谱图并将其放在列表中,所有音频形成四个列表(trainMixed,trainVocals,testMixed,testVocals)。像这样:
def to_spec(wav, n_fft=1024, hop_length=256):
return librosa.stft(wav, n_fft=n_fft, hop_length=hop_length)
def prepareData(filename, sr=22050, hop_length=256, n_fft=1024):
audio_wav = librosa.load(filename, sr=sr, mono=True, duration=30)[0]
audio_spec=to_spec(audio_wav, n_fft=n_fft, hop_length=hop_length)
audio_spec_mag = np.abs(audio_spec)
maxVal = np.max(audio_spec_mag)
return audio_spec_mag, maxVal
# FOR EVERY LIST (trainMixed, trainVocals, testMixed, testVocals)
trainMixed = []
trainMixedNum = 0
for (root, dirs, files) in walk('./Dev-subset-mix/Dev/'):
for d in dirs:
filenameMix = './Dev-subset-mix/Dev/'+d+'/mixture.wav'
spec_mag, maxVal = prepareData(filenameMix, n_fft=1024, hop_length=256)
trainMixed.append(spec_mag/maxVal)
接下来我建立模型:
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from keras.optimizers import SGD
from keras.layers.advanced_activations import LeakyReLU
model = Sequential()
model.add(Conv2D(16, (3,3), padding='same', input_shape=(513, 25, 1)))
model.add(LeakyReLU())
model.add(Conv2D(16, (3,3), padding='same'))
model.add(LeakyReLU())
model.add(MaxPooling2D(pool_size=(3,3)))
model.add(Dropout(0.25))
model.add(Conv2D(16, (3,3), padding='same'))
model.add(LeakyReLU())
model.add(Conv2D(16, (3,3), padding='same'))
model.add(LeakyReLU())
model.add(MaxPooling2D(pool_size=(3,3)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(64))
model.add(LeakyReLU())
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss=keras.losses.binary_crossentropy, optimizer=sgd, metrics=['accuracy'])
并运行模型:
model.fit(trainMixed, trainVocals,epochs=10, validation_data=(testMixed, testVocals))
但是我得到这个结果:
ValueError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:806 train_function *
return step_function(self, iterator)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:796 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:1211 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2585 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2945 _call_for_each_replica
return fn(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:789 run_step **
outputs = model.train_step(data)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:747 train_step
y_pred = self(x, training=True)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:976 __call__
self.name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/input_spec.py:158 assert_input_compatibility
' input tensors. Inputs received: ' + str(inputs))
ValueError: Layer sequential_1 expects 1 inputs, but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, 2584) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(None, 2584) dtype=float32>]
我是这个主题的新手,感谢您预先提供的帮助。
答案 0 :(得分:5)
为Keras的fit()
函数指定输入数据可能是一个问题。我建议像这样将tf.data.Dataset
用作fit()
的输入:
import tensorflow as tf
train_data = tf.data.Dataset.from_tensor_slices((trainMixed, trainVocals))
valid_data = tf.data.Dataset.from_tensor_slices((testMixed, testVocals))
model.fit(train_data, epochs=10, validation_data=valid_data)
然后,您还可以在TF数据集上使用shuffle()
和batch()
之类的功能。
编辑:似乎您输入的形状也不正确。您为第一转换层指定的input_shape
是(513, 25, 1)
,因此输入应为形状(batch_size, 513, 25, 1)
的批张量,而您要输入形状(batch_size, 2584)
。因此,您需要重塑形状,并可能将输入剪切成指定的形状,或者指定新的形状。