Question

我尝试将ptb_word_lm.py中的RNN模型移植到多GPU卡。我遵循cifar10_multi_gpu_train.py中的多塔式风格。但是，我找到了＆＃34;毕业生＆＃34;由tf.clip_by_global_norm(tf.gradients(cost, tvars), config.max_grad_norm)返回的不是Tensor类型的列表。它是tensorflow.python.framework.ops.IndexedSlices类型的列表。现在我需要总结和平均＆＃34;毕业生和＃34;由多个GPU塔返回到一个IndexedSlices或Tensor列表，以便将其传入 self._train_op = optimizer.apply_gradients(zip(grads, tvars))。我已尝试tf.convert_to_tensor将IndexedSlices转换为Tensor，但失败并出现以下错误：

File "ptb_word_lm.py", line 150, in __init__
    grads_0_tensor = tf.convert_to_tensor(grads[0])
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", 
    line 566, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/gradients.py",
    line 77, in _IndexedSlicesToTensor
    % str(value))
ValueError: Tensor conversion requested for IndexedSlices without dense_shape:
IndexedSlices(indices=Tensor("model/gradients/concat_1:0", shape=(400,), dtype=int32),
values=Tensor("model/clip_by_global_norm/model/clip_by_global_norm/_0:0",
shape=(?, 200), dtype=float32))

我如何合并这些IndexedSlices？或者是否存在以多GPU塔式风格并行化RNN的示例代码？

提前多多感谢！

Answer 1

我知道如何平均几个IndexedSlices。两个选择：

来自tensorflow.python.ops import gradients，然后使用函数gradients._IndexedSlicesToTensor将它们转换为张量。
沿着第一个维度连接以得到总和，使用IndexedSlices.values / n来获得平均值。

Answer 2

索引切片支持重复索引。因此，要将它们组合为一个总和，就足以沿第一个轴连接（索引和值）。

Answer 3

按照HY G的第二种方式，我得到了以下代码：

values = tf.concat([x.values for x in grads_per_model],0)
indices = tf.concat([x.indices for x in grads_per_model],0)
agg_grad = tf.IndexedSlices(values, indices)

将ptb_word_lm.py移植到多GPU塔时无法平均IndexedSlices

3 个答案: