我试图了解神经机器翻译中使用的编码-解码机制。我正在尝试使用tf.contrib.cudnn_rnn.CudnnGRU
。 Here是我正在参考的作品。在代码中,他已计算出CUDA RNN的静态参数大小,如下所示:
def cuda_params_size(cuda_model_builder):
"""
Calculates static parameter size for CUDA RNN
:param cuda_model_builder:
:return:
"""
with tf.Graph().as_default():
cuda_model = cuda_model_builder()
params_size_t = cuda_model.params_size()
config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))
with tf.Session(config=config) as sess:
result = sess.run(params_size_t)
return result
编码器的结构如下:
def make_encoder(time_inputs, encoder_features_depth, is_train, hparams, seed, transpose_output=True):
"""
Builds encoder, using CUDA RNN
:param time_inputs: Input tensor, shape [batch, time, features]
:param encoder_features_depth: Static size for features dimension
:param is_train:
:param hparams:
:param seed:
:param transpose_output: Transform RNN output to batch-first shape
:return:
"""
def build_rnn():
return RNN(num_layers=hparams.encoder_rnn_layers, num_units=hparams.rnn_depth,
input_size=encoder_features_depth,
direction='unidirectional',
dropout=hparams.encoder_dropout if is_train else 0, seed=seed)
static_p_size = cuda_params_size(build_rnn)
cuda_model = build_rnn()
params_size_t = cuda_model.params_size()
with tf.control_dependencies([tf.assert_equal(params_size_t, [static_p_size])]):
cuda_params = tf.get_variable("cuda_rnn_params",
initializer=tf.random_uniform([static_p_size], minval=-0.05, maxval=0.05,
dtype=tf.float32, seed=seed + 1 if seed else None))
我不明白的是,为什么他计算静态参数大小和动态参数大小不同?参数大小如何改变? (它们是通过创建RNN定义的。)