我希望能找到Neural Discrete Representation Learning paper的开源音频实现。我正在看this github repository。该论文说:“编码器具有6个跨步卷积,其跨度为2,窗口大小为4。” Jeremy编码器的代码为:
def _encoder(self, x):
'''
Note that we need a pair of reversal to ensure causality.
(i.e. no trailing pads)
`x`: [b, T, c]
'''
k_init = self.arch['initial_filter_width']
b = tf.shape(x)[0]
o = tf.zeros([b, k_init - 1, self.arch['dim_symbol_emb']])
x = tf.concat([o, x], 1)
k_init = self.arch['initial_filter_width']
x = tf.layers.conv1d(
inputs=x,
filters=self.arch['residual_channels'],
kernel_size=k_init,
kernel_regularizer=tf.keras.regularizers.l2(WEIGHT_DECAY),
name='initial_filtering',
kernel_initializer=tf.initializers.variance_scaling(
scale=1.43,
distribution='uniform'),
)
x = tf.nn.leaky_relu(x, 2e-2)
x = tf.reverse(x, [1]) # paired op to enforce causality
for i in range(self.arch['n_downsample_stack']):
conv = tf.layers.conv1d(
inputs=x,
filters=(i + 1) * self.arch['encoder']['filters'],
kernel_size=self.arch['encoder']['kernels'],
strides=2,
padding='same',
# activation=tf.nn.tanh,
kernel_initializer=tf.initializers.variance_scaling(
scale=1.15,
distribution='uniform'),
kernel_regularizer=tf.keras.regularizers.l2(WEIGHT_DECAY),
)
gate = tf.layers.conv1d(
inputs=x,
filters=(i + 1) * self.arch['encoder']['filters'],
kernel_size=self.arch['encoder']['kernels'],
strides=2,
padding='same',
# activation=tf.nn.sigmoid,
kernel_initializer=tf.initializers.variance_scaling(distribution='uniform'),
kernel_regularizer=tf.keras.regularizers.l2(WEIGHT_DECAY),
bias_initializer=tf.initializers.ones,
)
x = tf.nn.tanh(conv) * tf.nn.sigmoid(gate)
x = tf.reverse(x, [1]) # paired op to enforce causality
x = tf.layers.conv1d(
inputs=x,
filters=self.arch['dim_exemplar'],
kernel_size=1,
kernel_initializer=tf.initializers.variance_scaling(
distribution='uniform'),
kernel_regularizer=tf.keras.regularizers.l2(WEIGHT_DECAY),
)
return x
此编码器具有1 + arch ['n_downsample_stack'] + 1 = 8个卷积层,而文章说为6,kernel_sizes是arch ['initial_filter_width'] = 32,arch ['encoder'] ['kernels'] = 5、5、5、5、5、5、1,而文章说4,而文章中没有提及编码器中的任何门。
我是误解了代码,还是误解了本文,还是代码实现的体系结构与本文中描述的体系结构不同?论文的任何地方都有可行的实施方案吗?