Question

请考虑这个非常简单的最小例子：

import numpy as np
import tensorflow as tf

batch_size = 8

a = np.random.rand(batch_size, 5, 2048, 2048, 1)
b = np.random.rand(5, 8, 8, 1, 1)

with tf.device('/gpu:0'):
    at = tf.placeholder(shape=a.shape, dtype=tf.float32)
    bt = tf.placeholder(shape=b.shape, dtype=tf.float32)

    ct = tf.nn.conv3d(at, bt, strides=[1, 5, 8, 8, 1], padding='SAME')

with tf.Session() as sess:
    sess.run(ct, feed_dict={at: a.astype(np.float32), bt: b.astype(np.float32)})

我有一批a张形状为(5x2048x2048)的张量和一张形状为b的内核(5x8x8)。通过使用[5, 8, 8]的步幅执行卷积，得到的张量的形状将为(1, 256, 256)。我正在使用占位符来提供数据来模拟真实世界的用例，其中数据来自外部。

当我在具有2GB内存的GTX960上运行上面的代码时，我收到以下错误：

Ran out of memory trying to allocate 1.2KiB. See logs for memory state.

TensorFlow怎么可能无法分配这么少的内存？这可能是TensorFlow中的错误吗？如果我使用较小的batch_size，错误就会消失，但我想知道这里发生了什么，因为原则上整个问题应该完全适合GPU内存（事实上问题大小只有640.00MiB）。

日志说：

Bin for 1.2KiB was 1.0KiB, Chunk State: 
Chunk at 0x703d40000 of size 1280
Chunk at 0x703d40500 of size 671088640
Chunk at 0x72bd40500 of size 1280
Chunk at 0x72bd40a00 of size 2097152
Chunk at 0x72bf40a00 of size 1168111104
     Summary of in-use Chunks by size: 
2 Chunks of size 1280 totalling 2.5KiB
1 Chunks of size 2097152 totalling 2.00MiB
1 Chunks of size 671088640 totalling 640.00MiB
1 Chunks of size 1168111104 totalling 1.09GiB
Sum Total of in-use chunks: 1.71GiB

我已经确定了不同的块如下：

大小为1280的块是内核b：5 * 8 * 8 * 4 = 1280，其中4是float32的大小
大小为2097152（2.00MiB）的块是卷积输出：8 * 1 * 256 * 256 * 4 = 2097152，其中8是batch_size，4又是float32的大小{1}}
大小为671088640（640.00MiB）的块是输入a：8 * 5 * 2048 * 2048 * 4 = 671088640
大小为1168111104（大于1GiB）的块是GPU内存的剩余量，TensorFlow默认分配。事实上，如果我更改问题大小（即batch_size），此块的大小会相应变化，以便所有块总是在我的系统上总计为1.71GiB。

如果我理解正确，应该有足够的内存仍然可用。所以我无法弄清楚为什么我会出现内存不足错误。我错过了什么，或者这可能是一个错误？如果有人能对此有所了解，我会很感激。谢谢！

P.S。这是nvidia-smi的输出。没有进程占用GPU内存。

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960     Off  | 0000:01:00.0     Off |                  N/A |
|  0%   54C    P0    26W / 160W |      0MiB /  1996MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

conv3d内存不足（TensorFlow）

0 个答案: