具有MFCC音频功能的自动编码器

时间:2018-07-13 16:18:13

标签: python tensorflow deep-learning autoencoder mfcc

我试图训练一种自动编码器,以从音频MFCC功能的某些部分学习功能表示(我的数据集为TIMIT)。所以我用一个固定大小的窗口来剪切音频MFCC数据(13维MFCC),我的数据是(无,window_size * 13),这里我的窗口大小是20(帧)。我训练了一个具有feed_forward类型的自动编码器,使用损失函数作为最小平方误差(预测和标签之间的差异范数),问题是我的损失停止在较高水平上减少。我的网络部件代码在这里:

self.centers_placeholder = tf.placeholder(tf.float32,[None,self.train_centers[0].shape[1]],name = 'center_placeholder')



        layer = tf.layers.dense(self.centers_placeholder,2000,activation=tf.nn.relu)
        layer = tf.layers.batch_normalization(layer)

        layer = tf.layers.dense(layer,1500,activation=tf.nn.leaky_relu)
        layer = tf.layers.batch_normalization(layer)


        layer = tf.layers.dense(layer,500,activation=tf.nn.leaky_relu)
        layer = tf.layers.batch_normalization(layer)

        layer = tf.layers.dense(layer,100,activation=tf.nn.leaky_relu)
        layer = tf.layers.batch_normalization(layer)

        self.embedding= tf.layers.dense(layer,50,activation = tf.nn.leaky_relu)

        layer = tf.layers.dense(layer,100,activation=tf.nn.leaky_relu)
        layer = tf.layers.batch_normalization(layer)

        layer = tf.layers.dense(layer,500,activation=tf.nn.leaky_relu)
        layer = tf.layers.batch_normalization(layer)

        layer = tf.layers.dense(layer,1500,activation=tf.nn.leaky_relu)
        layer = tf.layers.batch_normalization(layer)


        layer = tf.layers.dense(layer,2000,activation=tf.nn.leaky_relu)
        layer = tf.layers.batch_normalization(layer)

        self.decoded = tf.layers.dense(layer,self.train_centers[0].shape[1],activation=None)

        #cost = tf.reduce_mean(tf.sqrt(tf.square(neighbors_placeholder-decoder4)))
        self.cost = tf.reduce_mean(tf.norm(self.centers_placeholder-self.decoded,ord=2,axis=1))
        #self.cost=tf.reduce_mean(tf.losses.mean_squared_error(labels=self.centers_placeholder,predictions=self.decoded))
        target_norm = tf.norm(self.centers_placeholder,ord=2,axis=1)
        difference_norm=tf.norm(self.centers_placeholder-self.decoded,ord=2,axis=1)
        self.metric = tf.reduce_mean(difference_norm/target_norm)
        print('model graph has built',flush=True)

        global_step=tf.Variable(0,trainable=False)
        starter_learning_rate=self.learning_rate
        lr=tf.train.exponential_decay(starter_learning_rate,global_step,100000,0.96,staircase=True)
        self.optimizer = tf.train.AdamOptimizer(lr).minimize(self.cost,global_step=global_step)
        self.saver = tf.train.Saver()

我尝试了不同的体系结构,无论是深层的还是浅层的,但是它总是在较高的层次上收敛。这是我损失的一个例子enter image description here

有人可以帮忙吗?有没有人在自动编码器中使用MFCC做过实验? 谢谢!

0 个答案:

没有答案