如何进行词嵌入为RNN提供输入?

时间:2018-09-10 12:27:09

标签: tensorflow nlp deep-learning lstm rnn

我正在尝试使用基本RNN进行单词预测。我需要向RNN单元提供输入;我正在尝试遵循代码

X_input = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))

tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)

x = tf.nn.embedding_lookup(tfWe, X_input)
x = tf.unstack(x, sequence_length, 1)
output, states = tf.nn.dynamic_rnn(rnn, x, dtype = tf.float32)
output = tf.transpose(output, (1,0,2))
output = tf.reshape(output, (sequence_length*num_samples,hidden_layer_size))

我遇到错误 ValueError:gru_cell_2层需要1个输入,但它收到了39个输入张量。我认为此错误是由于嵌入导致的,因为它没有给出可输入到GRUCell的尺寸张量。那么,如何为GRU Cell提供输入?

1 个答案:

答案 0 :(得分:0)

初始化X_input的方式可能是错误的。那多余的一维导致了问题。如果将其删除,则无需使用unstack。以下代码将起作用。

X_input = tf.placeholder(tf.int32, shape = (None, sequence_length))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length))

tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = tf.contrib.rnn.GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)

x = tf.nn.embedding_lookup(tfWe, X_input)
output, states = tf.nn.dynamic_rnn(rnn, x, dtype = tf.float32)
##shape of output here is (None,sequence_length,hidden_layer_size)

但是,如果您确实需要使用该尺寸,则需要在unstack中进行一些小的修改。您正在沿着axis=1将其堆叠成sequence_length个张量,这似乎也不正确。这样做:

X_input = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))

tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = tf.contrib.rnn.GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)

x = tf.nn.embedding_lookup(tfWe, X_input)
x = tf.unstack(x, 1, 2)
output, states = tf.nn.dynamic_rnn(rnn, x[0], dtype = tf.float32)
##shape of output here is again same (None,sequence_length,hidden_layer_size)

最后,如果您真的需要以sequence_length个张量来拆开堆叠,然后将unstack替换为tf.map_fn(),然后执行以下操作:

X_input = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))

tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = tf.contrib.rnn.GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)

x = tf.nn.embedding_lookup(tfWe, X_input)
x = tf.transpose(x,[1,0,2,3])
##tf.map_fn unstacks a tensor along the first dimension only so we need to make seq_len as first dimension by taking transpose

output,states = tf.map_fn(lambda x: tf.nn.dynamic_rnn(rnn,x,dtype=tf.float32),x,dtype=(tf.float32, tf.float32))
##shape of output here is (sequence_length,None,1,hidden_layer_size)

警告:注意每个解决方案中output的形状。警惕您想要哪种形状。

编辑:

要回答有关何时使用哪种类型的输入的问题:

假设您有25个句子,每个句子有15个单词,并且将其分为5个大小为5的批次。另外,假设您正在使用50个维度的单词嵌入(假设您使用的是word2vec),那么您的输入形状将为(batch_size=5,time_step=15, features=50)。在这种情况下,您不需要使用堆栈或任何类型的映射。

接下来,假设您有30个文档,每个文档有25个句子,每个句子15个字,然后将文档分成6批,每批5个。再次,假设您使用的是50个维度的单词嵌入,那么您的输入形状现在具有一个额外的维度。这里batch_size=5time_step=15features=50的句子数呢?现在,您输入的是(batch_size=5,num_sentences=25,time_step=15, features=50),它对于任何类型的RNNs都是无效的形状。在这种情况下,您需要沿句子维度将其拆开以制作25个张量,每个张量将具有(5,15,50)的形状。为此,我使用了tf.map_fn