我试图训练一种自动编码器,以从音频MFCC功能的某些部分学习功能表示(我的数据集为TIMIT)。所以我用一个固定大小的窗口来剪切音频MFCC数据(13维MFCC),我的数据是(无,window_size * 13),这里我的窗口大小是20(帧)。我训练了一个具有feed_forward类型的自动编码器,使用损失函数作为最小平方误差(预测和标签之间的差异范数),问题是我的损失停止在较高水平上减少。我的网络部件代码在这里:
self.centers_placeholder = tf.placeholder(tf.float32,[None,self.train_centers[0].shape[1]],name = 'center_placeholder')
layer = tf.layers.dense(self.centers_placeholder,2000,activation=tf.nn.relu)
layer = tf.layers.batch_normalization(layer)
layer = tf.layers.dense(layer,1500,activation=tf.nn.leaky_relu)
layer = tf.layers.batch_normalization(layer)
layer = tf.layers.dense(layer,500,activation=tf.nn.leaky_relu)
layer = tf.layers.batch_normalization(layer)
layer = tf.layers.dense(layer,100,activation=tf.nn.leaky_relu)
layer = tf.layers.batch_normalization(layer)
self.embedding= tf.layers.dense(layer,50,activation = tf.nn.leaky_relu)
layer = tf.layers.dense(layer,100,activation=tf.nn.leaky_relu)
layer = tf.layers.batch_normalization(layer)
layer = tf.layers.dense(layer,500,activation=tf.nn.leaky_relu)
layer = tf.layers.batch_normalization(layer)
layer = tf.layers.dense(layer,1500,activation=tf.nn.leaky_relu)
layer = tf.layers.batch_normalization(layer)
layer = tf.layers.dense(layer,2000,activation=tf.nn.leaky_relu)
layer = tf.layers.batch_normalization(layer)
self.decoded = tf.layers.dense(layer,self.train_centers[0].shape[1],activation=None)
#cost = tf.reduce_mean(tf.sqrt(tf.square(neighbors_placeholder-decoder4)))
self.cost = tf.reduce_mean(tf.norm(self.centers_placeholder-self.decoded,ord=2,axis=1))
#self.cost=tf.reduce_mean(tf.losses.mean_squared_error(labels=self.centers_placeholder,predictions=self.decoded))
target_norm = tf.norm(self.centers_placeholder,ord=2,axis=1)
difference_norm=tf.norm(self.centers_placeholder-self.decoded,ord=2,axis=1)
self.metric = tf.reduce_mean(difference_norm/target_norm)
print('model graph has built',flush=True)
global_step=tf.Variable(0,trainable=False)
starter_learning_rate=self.learning_rate
lr=tf.train.exponential_decay(starter_learning_rate,global_step,100000,0.96,staircase=True)
self.optimizer = tf.train.AdamOptimizer(lr).minimize(self.cost,global_step=global_step)
self.saver = tf.train.Saver()
我尝试了不同的体系结构,无论是深层的还是浅层的,但是它总是在较高的层次上收敛。这是我损失的一个例子enter image description here
有人可以帮忙吗?有没有人在自动编码器中使用MFCC做过实验? 谢谢!