我试图更多地了解我在实现tf图时看到的某些令人惊讶的结果。 我正在使用的图表只是一片森林(一堆树)。这只是一个简单的前向推理图,与训练无关。我正在分享2个实现的片段
代码段1:
with tf.name_scope("main"):
def get_tree_output(offset):
loop_vars = (offset,)
leaf_indice = tf.while_loop(cond,
body,
loop_vars,
back_prop=False,
parallel_iterations=1,
name="while_loop")
tree_score = tf.gather(score_tensor, leaf_indice, name="tree-scores")
output = tf.add(tree_score, output)
leaf_indices = tf.map_fn(get_tree_output,
tree_offsets_tensor,
dtype=INT_TYPE,
parallel_iterations=n_trees,
back_prop=False,
name="tree-scores")
tree_scores = tf.gather(score_tensor, leaf_indices, name="tree-scores")
output = tf.reduce_sum(tree_scores, name="sum-output")
output = tf.sigmoid(output, name="sigmoid-output")
代码段2:
with tf.name_scope("main"):
tree_offsets_tensor = tf.constant(tree_offsets, dtype=INT_TYPE, name="tree_offsets_tensor")
loop_vars = (tree_offsets_tensor,)
leaf_indices = tf.while_loop(cond,
body,
loop_vars,
back_prop=False,
parallel_iterations=n_trees,
name="while_loop")
tree_scores = tf.gather(score_tensor, leaf_indices, name="tree-scores")
output = tf.reduce_sum(tree_scores, name="sum-output")
output = tf.sigmoid(output, name="sigmoid-output")
其余代码完全相同:while循环的常量张量,变量,条件和主体。在这两种情况下,线程和并行性也是相同的 code snippet2:大约需要500微秒进行推理 代码片段1:大约需要12毫秒来做推理
不同之处在于,在代码段1中,我使用map_fn
对tree_offset_tensor
进行操作,而在代码段2中,我摆脱了map_fn
,并直接使用该张量,据我所知,在snippet1 get_tree_output
方法中使用tree_offset_tensor
中的一个元素调用,我们对每个偏移值都有多个while_loop
,而在代码段2中我们只有一个while_loop
1}}只需要多个偏移值(基本上是offset_tensor)。
我还尝试了另一种变体片段,而不是使用map_fn我写了一个为循环写的手
代码段1(循环变体):
output = 0
with tf.name_scope("main"):
for offset in tree_offsets:
loop_vars = (offset,)
leaf_indice = tf.while_loop(cond,
body,
loop_vars,
back_prop=False,
parallel_iterations=1,
name="while_loop")
tree_score = tf.gather(score_tensor, leaf_indice, name="tree-scores")
output = tf.add(tree_score, output)
#leaf_indices = tf.map_fn(get_tree_output,
# tree_offsets_tensor, dtype=INT_TYPE,
# parallel_iterations=n_trees, back_prop=False,
# name="tree-scores")
#tree_scores = tf.gather(score_tensor, leaf_indices, name="tree-scores")
#output = tf.reduce_sum(tree_scores, name="sum-output")
output = tf.sigmoid(output, name="sigmoid-output")
这提供了微小的改善:9毫升