我刚刚建立了TensorFlow v1.0,我正在尝试运行MNIST测试,看看它是否有效。看起来像是,但我观察到奇怪的行为。 我的系统有两个Tesla P100,nvidia-smi显示以下内容:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 361.107 Driver Version: 361.107 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-SXM2... Off | 0002:01:00.0 Off | 0 |
| N/A 34C P0 114W / 300W | 15063MiB / 16280MiB | 51% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-SXM2... Off | 0006:01:00.0 Off | 0 |
| N/A 27C P0 35W / 300W | 14941MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 67288 C python3 15061MiB |
| 1 67288 C python3 14939MiB |
+-----------------------------------------------------------------------------+
如图所示,python3占用了两个GPU上的所有内存,但计算负载仅放在第一个。
导出CUDA_VISIBLE_DEVICES我可以限制使用GPU,但它不会影响计算时间。所以添加第二个GPU没有任何好处。单GPU
real 2m23.496s
user 4m26.597s
sys 0m12.587s
两个GPU:
real 2m18.165s
user 4m18.625s
sys 0m12.958s
所以问题是,如何加载两个GPU?