使用辍学(TF2.0)时,可变批量大小不适用于tf.keras.layers.RNN吗?

时间:2019-10-30 21:53:21

标签: python lstm recurrent-neural-network tensorflow2.0 dropout

我想将RNN包装器与多个具有退出功能的LSTM单元一起使用。但是,如果批处理大小更改,则会出现错误。

当我删除辍学代码时,代码就可以正常工作,因此我认为问题在于批次之间没有重置辍学面具。

import numpy as np
import tensorflow as tf

input_dim = 3
output_dim = 3
num_timesteps = 2
neurons = [32,32]

# Model
input_layer = tf.keras.Input(shape=(num_timesteps, input_dim))
cell = [tf.keras.layers.LSTMCell(n,dropout=.2) for n in neurons]
lstm = tf.keras.layers.RNN(cell,return_state=True,return_sequences=True)
lstm_out, hidden_state, cell_state = lstm(input_layer)
output = tf.keras.layers.Dense(output_dim)(lstm_out)

mdl = tf.keras.Model(
    inputs=input_layer,
    outputs=[hidden_state, cell_state, output]
)

# Run batches of different sizes
batch_1 = np.random.rand(10, num_timesteps, input_dim).astype(np.float32)
h_state, c_state, out = mdl(batch_1) # batch size is 10x2x3

batch_2 = np.random.rand(9, num_timesteps, input_dim).astype(np.float32)
h_state, c_state, out = mdl(batch_2) # batch size is 9x2x3

此代码给出错误:InvalidArgumentError:不兼容的形状:[9,3]与[10,3] [Op:Mul]名称:model / rnn / mul /

如果我删除了代码,代码就会起作用。我可以以某种方式使用reset_dropout_mask吗?似乎没有被调用。

1 个答案:

答案 0 :(得分:0)

我可以用Tensorflow Version 2.0.0重现您的错误。

但是,如果我将Tensorflow Version升级到2.12.2并运行相同的代码,就没有错误。

完整的工作代码如下所示:

!pip install tensorflow==2.2

import numpy as np
import tensorflow as tf

print(tf.__version__) # Printing the Tensorflow Version just to be Sure

input_dim = 3
output_dim = 3
num_timesteps = 2
neurons = [32,32]

# Model
input_layer = tf.keras.Input(shape=(num_timesteps, input_dim))
cell = [tf.keras.layers.LSTMCell(n,dropout=.2) for n in neurons]
lstm = tf.keras.layers.RNN(cell,return_state=True,return_sequences=True)
lstm_out, hidden_state, cell_state = lstm(input_layer)
output = tf.keras.layers.Dense(output_dim)(lstm_out)

mdl = tf.keras.Model(
    inputs=input_layer,
    outputs=[hidden_state, cell_state, output]
)

# Run batches of different sizes
batch_1 = np.random.rand(10, num_timesteps, input_dim).astype(np.float32)
h_state, c_state, out = mdl(batch_1) # batch size is 10x2x3

batch_2 = np.random.rand(9, num_timesteps, input_dim).astype(np.float32)
h_state, c_state, out = mdl(batch_2) # batch size is 9x2x3

以上代码的输出如下所示:

2.2.0

希望这会有所帮助。学习愉快!