Question

我正在学习使用Tensorflow进行对象检测。为了加快培训过程，我采用了一个具有4个GPU的AWS g3.16xlarge实例。我正在使用以下代码来运行培训过程：

export CUDA_VISIBLE_DEVICES=0,1,2,3
 python object_detection/train.py --logtostderr --pipeline_config_path=/home/ubuntu/builder/rcnn.config --train_dir=/home/ubuntu/builder/experiments/training/

在rcnn.config内部-我已经设置了batch-size = 1。在运行期间，我得到以下输出：

控制台输出

2018-11-09 07:25:50.104310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2018-11-09 07:25:50.104385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 2 3 
2018-11-09 07:25:50.104395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y N N N 
2018-11-09 07:25:50.104402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   N Y N N 
2018-11-09 07:25:50.104409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 2:   N N Y N 
2018-11-09 07:25:50.104416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 3:   N N N Y 
2018-11-09 07:25:50.104429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla M60, pci bus id: 0000:00:1b.0, compute capability: 5.2)
2018-11-09 07:25:50.104439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Tesla M60, pci bus id: 0000:00:1c.0, compute capability: 5.2)
2018-11-09 07:25:50.104446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: Tesla M60, pci bus id: 0000:00:1d.0, compute capability: 5.2)
2018-11-09 07:25:50.104455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: Tesla M60, pci bus id: 0000:00:1e.0, compute capability: 5.2)

运行nvidia-smi时，得到以下输出： nvidia-smi输出

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           Off  | 0000:00:1B.0     Off |                    0 |
| N/A   52C    P0   129W / 150W |   7382MiB /  7612MiB |     92%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M60           Off  | 0000:00:1C.0     Off |                    0 |
| N/A   33C    P0    38W / 150W |   7237MiB /  7612MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla M60           Off  | 0000:00:1D.0     Off |                    0 |
| N/A   40C    P0    38W / 150W |   7237MiB /  7612MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla M60           Off  | 0000:00:1E.0     Off |                    0 |
| N/A   34C    P0    39W / 150W |   7237MiB /  7612MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     97860    C   python                                        7378MiB |
|    1     97860    C   python                                        7233MiB |
|    2     97860    C   python                                        7233MiB |
|    3     97860    C   python                                        7233MiB |
+-----------------------------------------------------------------------------+

和**nvidia-smi dmon**提供以下输出：

# gpu   pwr  temp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     %     %     %     %   MHz   MHz
    0   158    69    90    69     0     0  2505  1177
    1    38    36     0     0     0     0  2505   556
    2    38    45     0     0     0     0  2505   556
    3    39    37     0     0     0     0  2505   556

我对每个输出都感到困惑。当程序识别出4种不同GPU的可用性时，我读取了控制台输出，而在nvidia-smi输出中，仅对第一个GPU显示了易失GPU-Util百分比，对于其余GPU，它显示为零。但是，同一张表在底部显示了所有4 gpu的内存使用情况。而nvidia-smi dmon仅在第一个gpu上打印sm值，而对于其他gpu则为零。从这个blog中，我了解到dmon中的零表示GPU是免费的。

我想了解的是，train.py是否利用了我在实例中拥有的所有4个GPU。如果未充分利用所有GPU，如何确保针对所有GPU优化了object_detection/train.py张量流。

Answer 1

检查是否返回所有GPU的列表。

tf.test.gpu_device_name()

返回GPU设备的名称（如果有）或空字符串。

然后您可以执行以下操作来使用所有可用的GPU。

# Creates a graph.
c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))

您将看到以下输出：

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: /job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[  44.   56.]
 [  98.  128.]]

Answer 2

用于检查是否找到GPU并将其与tensorflow配合使用的Python代码：

## Libraries import
import tensorflow as tf

## Test GPU
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
print('')
config = tf.ConfigProto()
config.gpu_options.allow_growth = True

如何检查tensorflow是否正在使用所有可用的GPU

2 个答案: