Question

我在Tensorflow中实现了SSD（单次检测）。

在推理期间，我按如下方式加载冻结图：

    def load_graph(filename):
        with tf.gfile.FastGFile(filename, 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
        img, box, cls, val = tf.import_graph_def(graph_def, name='',
            return_elements=['input:0', 'pred/box:0', 'pred/cls:0', 'pred/val:0']
        )
        return img, box, cls, val

这样就不会标记不必要的操作。

但是，当我运行使用脚本时，如下所示：

    _box, _cls, _val = sess.run([box, cls, val],{img: np.expand_dims(image_data,0)},

（即我只使用批量大小的一个）

我注意到tensorflow抱怨内存分配：

2017-07-08 20:38:46.389877: W 
tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator 
(GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller 
indicates that this is not a failure, but may mean that there could be 
performance gains if more memory is available.

所以我决定对操作进行分析，得到以下结果：

正如您所看到的，没有任何操作声称超过75MB。

这种差异来自何处？

当我控制内存分配量时，如下所示：

    gpu_options = tf.GPUOptions(allow_growth=True, per_process_gpu_memory_fraction=0.1)
    config = tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options)

Tensorflow仍然抱怨可用的内存量，但我没有看到性能上的任何重要命中（即它总是在大约8毫秒内运行，无论分配的内存是多少）

当我允许它增长直到Tensorflow看起来很满意时，它会增长到大约6.7GB。

我不明白导致此行为发生的原因。这种差异来自何处？

Answer 1

您报告的警告表示设备内存上的分配失败并发生在主机内存上;

但是，在张量板截图中，您只显示设备内存（gpu:0复选框）。您需要选中cpu:0复选框以调试CPU操作。

现在，假设您在这种情况下获得快速计算，与在设备上分配时相比，并且给定分配大小（2GB），并且给出了张量板快照中灰色框的位置，我怀疑这个张量保持输入数据，并且只有切片作为批次发送到GPU。我会说这很好。

Tensorflow尝试使用比分析器指示的内存更多的内存

1 个答案: