如何在Keras中为WAV音频文件制作自动编码器

时间:2019-05-03 15:35:47

标签: audio keras deep-learning wav autoencoder

我最近一直在研究自动编码器神经网络,并且偶然发现了本教程,该教程描述了如何在Keras中为图像文件制作自动编码器: https://blog.keras.io/building-autoencoders-in-keras.html

阅读完本文后,我想知道如何修改此代码以编码wav音频文件。我的目标是能够对音频文件进行编码,获取编码后的文件,将其移动到另一个位置,然后对其进行解码以获取原始音频(或至少相当接近的东西)。我进行了一些研究,但是找不到用于wav文件的任何自动编码器,可以让您从算法中提取编码后的文件,然后将其重新插入解码器算法中。上面网站上的代码看起来很有希望,但是它是针对图像文件而不是音频而设计的。我正在查看的代码是这样的:

    from keras.layers import Input, Dense
    from keras.models import Model

    # this is the size of our encoded representations
    encoding_dim = 32  # 32 floats -> compression of factor 24.5,     assuming the input is 784 floats

    # this is our input placeholder
    input_img = Input(shape=(784,))
    # "encoded" is the encoded representation of the input
    encoded = Dense(encoding_dim, activation='relu')(input_img)
    # "decoded" is the lossy reconstruction of the input
    decoded = Dense(784, activation='sigmoid')(encoded)

    # this model maps an input to its reconstruction
    autoencoder = Model(input_img, decoded)

    # this model maps an input to its encoded representation
    encoder = Model(input_img, encoded)

    # create a placeholder for an encoded (32-dimensional) input
    encoded_input = Input(shape=(encoding_dim,))
    # retrieve the last layer of the autoencoder model
    decoder_layer = autoencoder.layers[-1]
    # create the decoder model
    decoder = Model(encoded_input, decoder_layer(encoded_input))

    autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

    from keras.datasets import mnist
    import numpy as np
    (x_train, _), (x_test, _) = mnist.load_data()
    x_train = x_train.astype('float32') / 255.
    x_test = x_test.astype('float32') / 255.
    x_train = x_train.reshape((len(x_train),   np.prod(x_train.shape[1:])))
    x_test = x_test.reshape((len(x_test),  np.prod(x_test.shape[1:])))
    print x_train.shape
    print x_test.shape

    autoencoder.fit(x_train, x_train,
            epochs=50,
            batch_size=256,
            shuffle=True,
            validation_data=(x_test, x_test))
    # encode and decode some digits
    # note that we take them from the *test* set
    encoded_imgs = encoder.predict(x_test)
    decoded_imgs = decoder.predict(encoded_imgs)

    # use Matplotlib (don't ask)
    import matplotlib.pyplot as plt

    n = 10  # how many digits we will display
    plt.figure(figsize=(20, 4))
    for i in range(n):
         # display original
         ax = plt.subplot(2, n, i + 1)
         plt.imshow(x_test[i].reshape(28, 28))
         plt.gray()
         ax.get_xaxis().set_visible(False)
         ax.get_yaxis().set_visible(False)

         # display reconstruction
         ax = plt.subplot(2, n, i + 1 + n)
         plt.imshow(decoded_imgs[i].reshape(28, 28))
         plt.gray()
         ax.get_xaxis().set_visible(False)
         ax.get_yaxis().set_visible(False)
    plt.show()

是否可以编辑此代码以使其编码wav文件?如果不是,是否还有其他程序可以让我对wav文件进行编码,提取编码后的文件,然后将其重新插入解码程序中,以使音频与原始音频极为接近? 任何帮助将不胜感激! 谢谢

0 个答案:

没有答案