Question

我正在尝试使用Tensorflow进入深度学习世界，但在使用我的GPU运行它后，我在运行基本object detector app时遇到错误。

错误是 CUDA_ERROR_OUT_OF_MEMORY ，说我实际拥有16GB时无法分配3.9GB的内存。实际上，当我尝试只使用1个工作符时，错误就解决了，但我相信它没有使用总内存。

对不起我的无知，但是当你用GPU运行Tensorflow时它会使用RAM内存还是GPU内存？我应该用CPU运行吗？在AWS中可能吗？你推荐什么？

同样，我甚至不知道这个问题是否有道理，所以真的，提前谢谢你！

操作系统：Ubuntu 16.04 64位。
处理器：Intel®Core™i7-6500U CPU @ 2.50GHz×4
显卡：GeForce 940MX / PCIe / SSE2

日志：

2017-09-16 11:39:56.458856: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 11:39:56.458878: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 11:39:56.458887: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 11:39:56.458894: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 11:39:56.458900: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 11:39:56.589540: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-09-16 11:39:56.590428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: GeForce 940MX
major: 5 minor: 0 memoryClockRate (GHz) 1.2415
pciBusID 0000:01:00.0
Total memory: 3.95GiB
Free memory: 143.25MiB
2017-09-16 11:39:56.590540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-09-16 11:39:56.590546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-09-16 11:39:56.590554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0)
2017-09-16 11:39:56.595486: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 143.25M (150208512 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

显卡：

01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2)
DeviceName: NVIDIA N16S-GTR
Subsystem: Hewlett-Packard Company GM108M [GeForce 940MX]
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at a3000000 (32-bit, non-prefetchable) [size=16M]
Memory at 90000000 (64-bit, prefetchable) [size=256M]
Memory at a0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 5000 [size=128]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_381, nvidia_381_drm

RAM：

Handle 0x0019, DMI type 17, 40 bytes
Memory Device
    Array Handle: 0x0018
    Error Information Handle: No Error
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 8192 MB
    Form Factor: SODIMM
    Set: None
    Locator: Bottom-Slot 1(left) 
    Bank Locator: BANK 0
    Type: DDR4
    Type Detail: Synchronous
    Speed: 2133 MHz
    Manufacturer: Samsung
    Serial Number: 21152224
    Asset Tag: 9876543210
    Part Number: M471A1G43DB0-CPB    
    Rank: 2
    Configured Clock Speed: 2133 MHz
    Minimum Voltage: 1.5 V
    Maximum Voltage: 1.5 V
    Configured Voltage: 1.2 V

Handle 0x001A, DMI type 17, 40 bytes
Memory Device
    Array Handle: 0x0018
    Error Information Handle: No Error
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 8192 MB
    Form Factor: SODIMM
    Set: None
    Locator: Bottom-Slot 2(right)
    Bank Locator: BANK 2
    Type: DDR4
    Type Detail: Synchronous
    Speed: 2133 MHz
    Manufacturer: Samsung
    Serial Number: 21152224
    Asset Tag: 9876543210
    Part Number: M471A1G43DB0-CPB    
    Rank: 2
    Configured Clock Speed: 2133 MHz
    Minimum Voltage: 1.5 V
    Maximum Voltage: 1.5 V
    Configured Voltage: 1.2 V

Answer 1

根据TensorFlow输出，您的GPU仅配备3.9 GB。但是TF-Session可用的内存只有143.25MiB。因此，要么运行另一个使用GPU的TF-Session，要么使用另一个支持GPU的GPU进程。我怀疑第一个，因为TF通常占用所有GPU内存。

第二个问题： TensorFlow使用所谓的固定内存来提高传输速度。所以你需要RAM和GPU内存。想想TF只能使用min（RAM，GPUmem）作为经验法则。

我建议您执行以下操作： - 在另一个终端中运行nvidia-smi以查看该GPU上是否有另一个进程并使用该内存。 - 运行CUDA_VISIBLE_DEVICES= python ....以CPU模式启动应用程序（如果代码支持）

输出应该像

+-----------------------------------------------------------------------------+
| NVIDIA-SMI xxx.xx                 Driver Version: xxx.xx                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 0000:02:00.0     Off |                  N/A |
| 23%   34C    P8     9W / 250W |      0MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    2     28289    C   python                                        6023MiB |
+-----------------------------------------------------------------------------+

和ps aux | grep 28289会告诉您有关在该GPU上运行的进程的更多信息。

编辑：每个工作人员似乎都使用single session这在单个GPU上通常是一个坏主意，并解释了为什么没有更多内存。第一个工作人员占用了所有内存GPU。

但是这给出了第三种可能的解决方案：Hack在以下几行中：

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1 / num_workers)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

运行GPU内存的本地张量流

1 个答案: