如何确保TensorFlow Generator的升采样过程创建完全覆盖随机噪声的种子?

时间:2019-06-22 03:28:46

标签: python tensorflow keras tensorflow2.0 dcgan

我正在努力使来自tensorflow 2.0 dcGAN教程(https://www.tensorflow.org/beta/tutorials/generative/dcgan)的代码适应音频信号。我正在使用libroasa chroma_cqt将原始音频数据转换为WxHx2矩阵,并将其用作输入。当我尝试通过扩大随机噪声来创建种子矩阵时,得到的结果是随机噪声和0的时空交替分布,顶部是细黑条(参见图片)。bared noise

我已经调整了原始教程代码,以处理各种尺寸的图像,并为种子图像和最终输出提供了良好的结果,但是相同的原理并不能带我3D数据。如何确保我在适当覆盖范围内取得种子,而在实际训练模型时不继续讨论该问题?

from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf

tf.__version__

import numpy as np
import os
from tensorflow.keras import layers
import librosa
import librosa.display

import matplotlib.pyplot as plt

os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

sr = 44100/2
sample_path = os.getcwd()


def make_generator_model():
    model = tf.keras.Sequential()
    model.add(layers.Dense(2*7*19*128, use_bias=False, dtype='float32', input_shape=(361,)))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Reshape((2 ,7, 19, 128)))
    assert model.output_shape == (None,2, 7, 19, 128) # Note: None is the batch size

    model.add(layers.Conv3DTranspose(128, (2, 5, 5), strides=(1, 6, 1), padding='same', use_bias=False))
    assert model.output_shape == (None, 2, 42, 19, 128)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Conv3DTranspose(128, (2, 5, 5), strides=(1, 3, 19), padding='same', use_bias=False))
    assert model.output_shape == (None, 2, 126, 361, 128)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Conv3DTranspose(1, (2, 5, 5), strides=(1, 2, 1), padding='same', use_bias=False, activation='tanh'))
    assert model.output_shape == (None, 2, 252, 361, 1)

    return model


generator = make_generator_model()
noise = tf.random.normal([1, 361])
generated_audio = generator(noise, training=False)


D = []
for x in range(len(generated_audio[0][0])):
    this_line = []    
    for y in range(len(generated_audio[0][0][x])):
        this_line.append(np.complex(generated_audio[0][0][x][y],generated_audio[0][1][x][y]))
    D.append(this_line)
D = np.asarray(D)


librosa.display.specshow(librosa.amplitude_to_db(np.abs(D), ref=np.max),
                          sr=sr, x_axis='time', y_axis='cqt_note')
plt.axis('off')
plt.savefig(sample_path + '\\image_at_epoch_fuzz.png')
plt.show()


print(D.shape)

我正在输出音频噪声的视觉表示,它看起来应该像是完全模糊的图像。取而代之的是,我听到交替的杂音和黑色竖线。

编辑:问题最终是我需要遵循哪些规则来匹配生成器种子,内核大小和步幅?

2 个答案:

答案 0 :(得分:1)

这种情况发生在您的步幅太大时。尝试使用较大的Dense层和较小的步幅或更多Conv3DTranspose层。像这样:

def make_generator_model():
    model = tf.keras.Sequential()
    model.add(layers.Dense(2*32*46*128, use_bias=False, dtype='float32', input_shape=(361,)))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Reshape((2, 32, 46, 128)))
    # assert model.output_shape == (None,2, 7, 19, 128) # Note: None is the batch size

    model.add(layers.Conv3DTranspose(128, (2, 3, 3), strides=(1, 2, 2), padding='same', use_bias=False))
    # assert model.output_shape == (None, 2, 42, 19, 128)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Conv3DTranspose(128, (2, 3, 3), strides=(1, 2, 2), padding='same', use_bias=False))
    # assert model.output_shape == (None, 2, 126, 361, 128)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Conv3DTranspose(1, (2, 3, 3), strides=(1, 2, 2), padding='same', use_bias=False, activation='tanh'))
    # assert model.output_shape == (None, 2, 252, 361, 1)
    model.add(layers.Lambda(lambda x: x[:, :, :252, :361, :]))

    return model

答案 1 :(得分:0)

所以问题最终出在卷积kernel_size与步幅之间的关系上(有关每个术语的更好解释,请参见此处https://keras.io/layers/convolutional/的Conv3DTranspose部分)。首先,密集层很好。  在原始代码中,以下Conv3DTranspose行的kernel_size不覆盖高度方向(5 <6)和宽度方向(5 <19)上的步幅

$0

通过确保kernel_size的最小尺寸与所选步幅尺寸匹配来解决此问题。这是固定的代码:

model.add(layers.Conv3DTranspose(128, (2, 5, 5), strides=(1, 6, 1), padding='same', use_bias=False))

model.add(layers.Conv3DTranspose(128, (2, 5, 5), strides=(1, 3, 19), padding='same', use_bias=False))

结果:properly upsampled noise image