Question

我正在尝试使用Tensorflow实现一些自定义GRU单元。我需要堆叠这些单元格，并且我想从tensorflow.keras.layers.GRU继承。但是，在查看源代码时，我注意到您只能将units参数传递给__init__的{{1}}，而RNN的参数是列表GRU中的第一个，并利用它来堆叠调用RNNcell的那些单元。同时，StackedRNNCells仅创建一个GRU。

对于我要实现的论文，我实际上需要堆叠GRUCell。为什么GRUCell和RNN的实现方式不同？

Answer 1

在搜索这些类的文档以添加链接时，我注意到可能会被您绊倒：（目前，就在TF 2.0正式发布之前）有两个 {{1} } TensorFlow中的实现！有一个tf.nn.rnn_cell.GRUCell和一个tf.keras.layers.GRUCell。似乎已弃用GRUCell中的那个，而Keras那个是您应该使用的那个。

据我所知，tf.nn.rnn_cell具有与tf.keras.layers.LSTMCell和tf.keras.layers.SimpleRNNCell相同的GRUCell方法签名，它们都继承自__call__()。 RNN文档对传递给Layer参数的对象的__call__()方法必须执行的操作提出了一些要求，但我猜测这三个条件都应满足这些要求。您应该能够只使用相同的cell框架，并向其传递RNN对象的列表，而不是GRUCell或LSTMCell。

我目前无法对此进行测试，因此不确定是否将SimpleRNNCell个对象的列表或仅GRU个对象传递到GRUCell中，但是我认为这些应该起作用。

Answer 2

train_graph = tf.Graph（）使用train_graph.as_default（）：

# Initialize input placeholders
input_text = tf.placeholder(tf.int32, [None, None], name='input')
targets = tf.placeholder(tf.int32, [None, None], name='targets')
lr = tf.placeholder(tf.float32, name='learning_rate')

# Calculate text attributes
vocab_size = len(int_to_vocab)
input_text_shape = tf.shape(input_text)

# Build the RNN cell
lstm = tf.contrib.rnn.BasicLSTMCell(num_units=rnn_size)
drop_cell = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
cell = tf.contrib.rnn.MultiRNNCell([drop_cell] * num_layers)

# Set the initial state
initial_state = cell.zero_state(input_text_shape[0], tf.float32)
initial_state = tf.identity(initial_state, name='initial_state')

# Create word embedding as input to RNN
embed = tf.contrib.layers.embed_sequence(input_text, vocab_size, embed_dim)

# Build RNN
outputs, final_state = tf.nn.dynamic_rnn(cell, embed, dtype=tf.float32)
final_state = tf.identity(final_state, name='final_state')

# Take RNN output and make logits
logits = tf.contrib.layers.fully_connected(outputs, vocab_size, activation_fn=None)

# Calculate the probability of generating each word
probs = tf.nn.softmax(logits, name='probs')

# Define loss function
cost = tf.contrib.seq2seq.sequence_loss(
    logits,
    targets,
    tf.ones([input_text_shape[0], input_text_shape[1]])
)

＃学习率优化器优化程序= tf.train.AdamOptimizer（learning_rate）

# Gradient clipping to avoid exploding gradients
gradients = optimizer.compute_gradients(cost)
capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
train_op = optimizer.apply_gradients(capped_gradients)

GRU和RNN实施之间的不一致

2 个答案: