在多GPU情况下,“with tf.device(gpu_id):”下是否存在CPU代码?

时间:2018-02-05 06:25:34

标签: python tensorflow deep-learning gpu

您好我是Tensorflow的新手,我被分配了一项任务,即在Github项目“tf-faster-rcnn”中更改“Demo.py”以实现多GPU推理。

这通常是我打算做的(假设我拥有的图像数量与GPU的数量相同,我将使用一个队列,为简单起见,此处未说明):

for id, gpu in gpu_dict:
    with tf.device(gpu):
        im_detect(images[id])

“im_detect”函数在源文件中提供(我可以直接调用它),它包含一些非GPU代码(如条件和数据准备)

def im_detect(sess, net, im):
  blobs, im_scales = _get_blobs(im)
  assert len(im_scales) == 1, "Only single-image batch implemented"

  im_blob = blobs['data']
  blobs['im_info'] = np.array([im_blob.shape[1], im_blob.shape[2], im_scales[0]], dtype=np.float32)

  _, scores, bbox_pred, rois = net.test_image(sess, blobs['data'], blobs['im_info'])

  boxes = rois[:, 1:5] / im_scales[0]
  scores = np.reshape(scores, [scores.shape[0], -1])
  bbox_pred = np.reshape(bbox_pred, [bbox_pred.shape[0], -1])
  if cfg.TEST.BBOX_REG:
    # Apply bounding-box regression deltas
    box_deltas = bbox_pred
    pred_boxes = bbox_transform_inv(boxes, box_deltas)
    pred_boxes = _clip_boxes(pred_boxes, im.shape)
  else:
    # Simply repeat the boxes, once for each class
    pred_boxes = np.tile(boxes, (1, scores.shape[1]))

  return scores, pred_boxes

由于我之前从未玩过GPU,而且我是Tensorflow的新手,我想问一下,在Tensorflow中为每个GPU分配这样的函数调用是否可行?

----------------下面更新了------------------------

我知道Tensorflow中有一个“alow_soft_placement”选项并将这些非GPU代码分配给CPU,但是当有多个GPU时,一个CPU如何处理来自所有GPU的请求?我应该为每个GPU创建一个CPU线程吗?

1 个答案:

答案 0 :(得分:1)

是。来自https://www.tensorflow.org/programmers_guide/using_gpu。如果操作系统没有CUDA内核,则会话配置的allow_soft_placement参数允许TensorFlow回退到CPU。

myConf = tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)
sess = tf.Session(config=myConf)

有时你不会想要这个 - 例如如果您正在尝试验证您期望实际在GPU上运行的所有操作。

您还可以使用with tf.device('/cpu:0'):块内的with tf.device('/gpu:0'):将操作显式分配给CPU。

我倾向于使用严格的放置,然后在TensorFlow抱怨时显式地将不兼容的操作分配给cpu。这样我就可以确保所有适当的操作都经过GPU优化。

更新:

这里有一些原理图代码,应该概述如何在GPU上运行并行计算。

graph = tf.Graph()

with graph.as_default():

gpus = ['/gpu:0', '/gpu:1']
results = []
datasets = []

for idx, gpu in enumerate(gpus):
   with tf.device(gpu):
       # assign data prep ops to CPU
       # (or use soft placement and leave out the next line).
       with tf.device('/cpu:0'):
            datasets[idx] = tf.placeholder(tf.float32, name = 'Features'+idx)

       # Computationally expensive ops get assigned to GPU, but make reference
       # to specific non-GPU ops on CPU.
       results[idx] = tf.reduce_sum(datasets[idx])  

myConf = tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)

with tf.Session(graph=graph, config=myConf) as session:

    # Now, using the graph set up previously, evaluate results
    # using both gpu devices (each these ops depends on independent
    # cpu ops).
    res0, res1 = session.run([results[0], results[1]])