我为{@ 1}},LSTMCell
,DropoutWrapper
和MultiRNNCell
( Model_Orig )构建了一个简单的堆叠动态双向LSTM,用于回归问题。 20个时期后的测试绝对误差为2.89,训练时间为14.5小时。
然后我尝试了另一种实现( Model_blockfused ),它具有相同的结构,但使用了块融合的组件(即bidirectional_dynamic_rnn
,tf.layers.dropout
,LSTMBlockFusedCell
) 。 Model_blockfused (3.6小时)的训练时间要短得多,但20个时期后的测试绝对误差大约高出6%(3.06)。
那么,我应该期望TimeReversedFusedRNN
和LSTMBlockFusedCell
之间的效果有6%的差异吗?或者在构建 Model_blockfused 时是否犯了任何错误(尤其是辍学)?
以下是 Model_Orig 的简化代码:
LSTMCell
以下是 Model_blockfused 的简化代码:
LSTM_CELL_SIZE = 200
keep_prob = 0.90
parallel_iterations = 512
dropcells = []
for iiLyr in list(range(3)):
cell_iiLyr = tf.nn.rnn_cell.LSTMCell(num_units=LSTM_CELL_SIZE, state_is_tuple=True)
dropcells.append(tf.nn.rnn_cell.DropoutWrapper(cell=cell_iiLyr, output_keep_prob=keep_prob))
MultiLyr_cell = tf.nn.rnn_cell.MultiRNNCell(cells=dropcells, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(
cell_fw=MultiLyr_cell,
cell_bw=MultiLyr_cell,
inputs=Orig_input_TSs, #shape of Orig_input_TSs: [#batches, time_len, #input_features]
dtype=tf.float32,
sequence_length=length, # shape of length: [#batches, 1]
parallel_iterations = parallel_iterations, # default:32, Those operations which do not have any temporal dependency and can be run in parallel, will be.
scope = "BiLSTM"
)
states_fw, states_bw = states
# get the states (c and h, both directions) from the top LSTM layer for final fully connected layers.
c_fw_lstLyr, h_fw_lstLyr = states_fw[-1]
c_bw_lstLyr, h_bw_lstLyr = states_bw[-1]
感谢。
答案 0 :(得分:1)
首先,对于fw和bw,应使用两个独立的tf.contrib.rnn.LSTMBlockFusedCell,更改下面的代码
cur_fw_BFcell_obj = tf.contrib.rnn.LSTMBlockFusedCell(num_units=LSTM_CELL_SIZE)
cur_bw_BFcell_obj = tf.contrib.rnn.TimeReversedFusedRNN(cur_fw_BFcell_obj)
对此:
cur_fw_BFcell_obj = tf.contrib.rnn.LSTMBlockFusedCell(num_units=LSTM_CELL_SIZE)
cur_bw_BFcell_obj_cell = tf.contrib.rnn.LSTMBlockFusedCell(num_units=LSTM_CELL_SIZE)
cur_bw_BFcell_obj = tf.contrib.rnn.TimeReversedFusedRNN(cur_bw_BFcell_obj_cell)
秒钟,在tf的tf.contrib.rnn.stack_bidirectional_dynamic_rnn API中,它说
组合的正向和反向层输出用作输入 下一层。
所以下面的代码
fw_out_TM, fw_state = cur_fw_BFcell_obj(fw_out_TM, dtype=tf.float32, sequence_length=length)
bw_out_TM, bw_state = cur_bw_BFcell_obj(bw_out_TM, dtype=tf.float32, sequence_length=length)
应更改为:
next_layer_input = tf.concat([fw_out_TM, bw_out_TM], axis=2)
fw_out_TM, fw_state = cur_fw_BFcell_obj(next_layer_input, dtype=tf.float32, sequence_length=length)
bw_out_TM, bw_state = cur_bw_BFcell_obj(next_layer_input, dtype=tf.float32, sequence_length=length)