向LSTMCell提供初始状态

时间:2018-04-01 23:29:38

标签: python tensorflow machine-learning

我在这里引用代码https://github.com/martin-gorner/tensorflow-rnn-shakespeare/blob/master/rnn_train.py 我正在尝试将细胞从GRUCell转换为LSTMCell。以下是代码的摘录。

# input state
Hin = tf.placeholder(tf.float32, [None, INTERNALSIZE * NLAYERS], name='Hin')  # [ BATCHSIZE, INTERNALSIZE * NLAYERS]

# using a NLAYERS=3 layers of GRU cells, unrolled SEQLEN=30 times
# dynamic_rnn infers SEQLEN from the size of the inputs Xo

# How to properly apply dropout in RNNs: see README.md
cells = [rnn.GRUCell(INTERNALSIZE) for _ in range(NLAYERS)]

# "naive dropout" implementation
dropcells = [rnn.DropoutWrapper(cell, input_keep_prob=pkeep) for cell in cells]
multicell = rnn.MultiRNNCell(dropcells, state_is_tuple=False)
multicell = rnn.DropoutWrapper(multicell, output_keep_prob=pkeep)  # dropout for the softmax layer

Yr, H = tf.nn.dynamic_rnn(multicell, Xo, dtype=tf.float32, initial_state=Hin)
# Yr: [ BATCHSIZE, SEQLEN, INTERNALSIZE ]
# H:  [ BATCHSIZE, INTERNALSIZE*NLAYERS ] # this is the last state in the sequence

H = tf.identity(H, name='H')  # just to give it a name

我知道LSTMCell有两个状态,单元状态C和输出状态H.我想要做的是用两个状态的元组提供initial_state。 我怎么能以正确的方式这样做?我尝试了各种方法,但始终遇到张量流错误。

编辑:这是尝试之一:

# inputs
X = tf.placeholder(tf.uint8, [None, None], name='X')  # [ BATCHSIZE, SEQLEN ]
Xo = tf.one_hot(X, ALPHASIZE, 1.0, 0.0)  # [ BATCHSIZE, SEQLEN, ALPHASIZE ]
# expected outputs = same sequence shifted by 1 since we are trying to predict the next character
Y_ = tf.placeholder(tf.uint8, [None, None], name='Y_')  # [ BATCHSIZE, SEQLEN ]
Yo_ = tf.one_hot(Y_, ALPHASIZE, 1.0, 0.0)  # [ BATCHSIZE, SEQLEN, ALPHASIZE ]
# input state
Hin = tf.placeholder(tf.float32, [None, INTERNALSIZE * NLAYERS], name='Hin')  # [ BATCHSIZE, INTERNALSIZE * NLAYERS]
Cin = tf.placeholder(tf.float32, [None, INTERNALSIZE * NLAYERS], name='Cin')
initial_state = tf.nn.rnn_cell.LSTMStateTuple(Cin, Hin)
# using a NLAYERS=3 layers of GRU cells, unrolled SEQLEN=30 times
# dynamic_rnn infers SEQLEN from the size of the inputs Xo

# How to properly apply dropout in RNNs: see README.md
cells = [rnn.LSTMCell(INTERNALSIZE) for _ in range(NLAYERS)]

# "naive dropout" implementation
dropcells = [rnn.DropoutWrapper(cell, input_keep_prob=pkeep) for cell in cells]
multicell = rnn.MultiRNNCell(dropcells, state_is_tuple=True)
multicell = rnn.DropoutWrapper(multicell, output_keep_prob=pkeep)  # dropout for the softmax layer

Yr, H = tf.nn.dynamic_rnn(multicell, Xo, dtype=tf.float32, initial_state=initial_state)

它说" TypeError:' Tensor'对象不可迭代。"

感谢。

1 个答案:

答案 0 :(得分:1)

错误正在发生,因为您在构建图表时必须为每个单独层提供元组(占位符),然后在您进行培训时必须提供第一层的状态。

错误在于:我需要遍历(c' s和m')的元组的列表,因为你有多个单元格,我需要初始化他们所有的状态,但我所看到的只是一个Tensor,我无法对此进行迭代。

此代码段显示了在构建图表时如何设置占位符:

state_size = 10
num_layers = 3

X = tf.placeholder(tf.float32, [None, 100, 10])

# the second dimension is size 2 and represents
# c, m ( the cell and hidden state ) 
# set the batch_size to None
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, 
                                    None, state_size])
# l is number of layers placeholders 
l = tf.unstack(state_placeholder, axis=0)

then we create a tuple of LSTMStateTuple for each layer
rnn_tuple_state = tuple(
         [rnn.LSTMStateTuple(l[idx][0],l[idx][1])
          for idx in range(num_layers)]
)

# I had to set resuse = True here : tf.__version__ 1.7.0
cells  = [rnn.LSTMCell(10, reuse=True)] * num_layers
mc = rnn.MultiRNNCell(cells, state_is_tuple=True)

outputs, state = tf.nn.dynamic_rnn(cell=mc,
                                   inputs=X,
                                   initial_state=rnn_tuple_state,
                                   dtype=tf.float32)

以下是docs的相关位:

  

initial_state :(可选)RNN的初始状态。如果   cell.state_size是一个整数,这必须是适当的Tensor   类型和形状[batch_size,cell.state_size]。

因此,我们结束了为每个单元格(图层)创建一个具有必需大小的占位符元组。 (batch_size,state_size)其中batch_size = None。 我阐述了answer