我正在尝试使用Tensorflow进入深度学习世界,但在使用我的GPU运行它后,我在运行基本object detector app时遇到错误。
错误是 CUDA_ERROR_OUT_OF_MEMORY ,说我实际拥有16GB时无法分配3.9GB的内存。实际上,当我尝试只使用1个工作符时,错误就解决了,但我相信它没有使用总内存。
对不起我的无知,但是当你用GPU运行Tensorflow时它会使用RAM内存还是GPU内存?我应该用CPU运行吗?在AWS中可能吗? 你推荐什么?
同样,我甚至不知道这个问题是否有道理,所以真的,提前谢谢你!
操作系统:Ubuntu 16.04 64位。日志:
2017-09-16 11:39:56.458856: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 11:39:56.458878: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 11:39:56.458887: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 11:39:56.458894: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 11:39:56.458900: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-09-16 11:39:56.589540: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-09-16 11:39:56.590428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: GeForce 940MX
major: 5 minor: 0 memoryClockRate (GHz) 1.2415
pciBusID 0000:01:00.0
Total memory: 3.95GiB
Free memory: 143.25MiB
2017-09-16 11:39:56.590540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-09-16 11:39:56.590546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2017-09-16 11:39:56.590554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0)
2017-09-16 11:39:56.595486: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 143.25M (150208512 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
显卡:
01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2)
DeviceName: NVIDIA N16S-GTR
Subsystem: Hewlett-Packard Company GM108M [GeForce 940MX]
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at a3000000 (32-bit, non-prefetchable) [size=16M]
Memory at 90000000 (64-bit, prefetchable) [size=256M]
Memory at a0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 5000 [size=128]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_381, nvidia_381_drm
RAM:
Handle 0x0019, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0018
Error Information Handle: No Error
Total Width: 64 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: SODIMM
Set: None
Locator: Bottom-Slot 1(left)
Bank Locator: BANK 0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MHz
Manufacturer: Samsung
Serial Number: 21152224
Asset Tag: 9876543210
Part Number: M471A1G43DB0-CPB
Rank: 2
Configured Clock Speed: 2133 MHz
Minimum Voltage: 1.5 V
Maximum Voltage: 1.5 V
Configured Voltage: 1.2 V
Handle 0x001A, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0018
Error Information Handle: No Error
Total Width: 64 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: SODIMM
Set: None
Locator: Bottom-Slot 2(right)
Bank Locator: BANK 2
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MHz
Manufacturer: Samsung
Serial Number: 21152224
Asset Tag: 9876543210
Part Number: M471A1G43DB0-CPB
Rank: 2
Configured Clock Speed: 2133 MHz
Minimum Voltage: 1.5 V
Maximum Voltage: 1.5 V
Configured Voltage: 1.2 V
答案 0 :(得分:0)
根据TensorFlow输出,您的GPU仅配备3.9 GB。但是TF-Session可用的内存只有143.25MiB
。
因此,要么运行另一个使用GPU的TF-Session,要么使用另一个支持GPU的GPU进程。我怀疑第一个,因为TF通常占用所有GPU内存。
第二个问题: TensorFlow使用所谓的固定内存来提高传输速度。所以你需要RAM和GPU内存。想想TF只能使用min(RAM,GPUmem)作为经验法则。
我建议您执行以下操作:
- 在另一个终端中运行nvidia-smi
以查看该GPU上是否有另一个进程并使用该内存。
- 运行CUDA_VISIBLE_DEVICES= python ....
以CPU模式启动应用程序(如果代码支持)
输出应该像
+-----------------------------------------------------------------------------+
| NVIDIA-SMI xxx.xx Driver Version: xxx.xx |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 0000:02:00.0 Off | N/A |
| 23% 34C P8 9W / 250W | 0MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 2 28289 C python 6023MiB |
+-----------------------------------------------------------------------------+
和ps aux | grep 28289
会告诉您有关在该GPU上运行的进程的更多信息。
编辑:每个工作人员似乎都使用single session这在单个GPU上通常是一个坏主意,并解释了为什么没有更多内存。第一个工作人员占用了所有内存GPU。
但是这给出了第三种可能的解决方案:Hack在以下几行中:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1 / num_workers)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))