我正在学习使用Tensorflow进行对象检测。为了加快培训过程,我采用了一个具有4个GPU的AWS g3.16xlarge实例。我正在使用以下代码来运行培训过程:
export CUDA_VISIBLE_DEVICES=0,1,2,3
python object_detection/train.py --logtostderr --pipeline_config_path=/home/ubuntu/builder/rcnn.config --train_dir=/home/ubuntu/builder/experiments/training/
在rcnn.config内部-我已经设置了batch-size = 1
。在运行期间,我得到以下输出:
控制台输出
2018-11-09 07:25:50.104310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2018-11-09 07:25:50.104385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 2 3
2018-11-09 07:25:50.104395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0: Y N N N
2018-11-09 07:25:50.104402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1: N Y N N
2018-11-09 07:25:50.104409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 2: N N Y N
2018-11-09 07:25:50.104416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 3: N N N Y
2018-11-09 07:25:50.104429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla M60, pci bus id: 0000:00:1b.0, compute capability: 5.2)
2018-11-09 07:25:50.104439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Tesla M60, pci bus id: 0000:00:1c.0, compute capability: 5.2)
2018-11-09 07:25:50.104446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: Tesla M60, pci bus id: 0000:00:1d.0, compute capability: 5.2)
2018-11-09 07:25:50.104455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: Tesla M60, pci bus id: 0000:00:1e.0, compute capability: 5.2)
运行nvidia-smi
时,得到以下输出:
nvidia-smi输出
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 Off | 0000:00:1B.0 Off | 0 |
| N/A 52C P0 129W / 150W | 7382MiB / 7612MiB | 92% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 Off | 0000:00:1C.0 Off | 0 |
| N/A 33C P0 38W / 150W | 7237MiB / 7612MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 Off | 0000:00:1D.0 Off | 0 |
| N/A 40C P0 38W / 150W | 7237MiB / 7612MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 Off | 0000:00:1E.0 Off | 0 |
| N/A 34C P0 39W / 150W | 7237MiB / 7612MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 97860 C python 7378MiB |
| 1 97860 C python 7233MiB |
| 2 97860 C python 7233MiB |
| 3 97860 C python 7233MiB |
+-----------------------------------------------------------------------------+
和**nvidia-smi dmon**
提供以下输出:
# gpu pwr temp sm mem enc dec mclk pclk
# Idx W C % % % % MHz MHz
0 158 69 90 69 0 0 2505 1177
1 38 36 0 0 0 0 2505 556
2 38 45 0 0 0 0 2505 556
3 39 37 0 0 0 0 2505 556
我对每个输出都感到困惑。当程序识别出4种不同GPU的可用性时,我读取了控制台输出,而在nvidia-smi输出中,仅对第一个GPU显示了易失GPU-Util百分比,对于其余GPU,它显示为零。但是,同一张表在底部显示了所有4 gpu的内存使用情况。而nvidia-smi dmon仅在第一个gpu上打印sm值,而对于其他gpu则为零。从这个blog中,我了解到dmon
中的零表示GPU是免费的。
我想了解的是,train.py是否利用了我在实例中拥有的所有4个GPU。如果未充分利用所有GPU,如何确保针对所有GPU优化了object_detection/train.py
张量流。
答案 0 :(得分:3)
检查是否返回所有GPU的列表。
tf.test.gpu_device_name()
返回GPU设备的名称(如果有)或空字符串。
然后您可以执行以下操作来使用所有可用的GPU。
# Creates a graph.
c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
with tf.device(d):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))
您将看到以下输出:
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: /job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[ 44. 56.]
[ 98. 128.]]
答案 1 :(得分:0)
用于检查是否找到GPU并将其与tensorflow
配合使用的Python代码:
## Libraries import
import tensorflow as tf
## Test GPU
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
print('')
config = tf.ConfigProto()
config.gpu_options.allow_growth = True