当有1个以上的GPU时如何使用多线程推理?

时间:2018-08-03 14:15:13

标签: python tensorflow

更新:通过在工作人员的参数中加入xysess来解决。

原始问题

我有两个GPU,并计划为一个GPU创建一个线程以实现(数据)并行性。我的尝试:

import tensorflow as tf

gpus = ['/device:GPU:0', '/device:GPU:1']
n_devices = len(gpus)
graphs = []

for gpu in gpus:
    with tf.device(gpu):
        with tf.gfile.GFile(graph_def_pb_path, "rb") as f:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(f.read())
        for node in graph_def.node:
            node.device = gpu
        with tf.Graph().as_default() as g:
            tf.import_graph_def(graph_def)
            graphs.append(g)

xs = [g.get_tensor_by_name(<input_tensor_name>) for g in graphs]
ys = [g.get_tensor_by_name(<output_tensor_name>) for g in graphs]
sessions = [tf.Session(graph=g, 
    config=tf.ConfigProto(log_device_placement=True)) for g in graphs]
workers = [lambda x_val: sess.run(y, feed_dict={x: x_val}) 
    for sess, x, y in zip(sessions, xs, ys)]
n_devices = len(graphs)
results = []
threads = [None] * n_devices

for i, image in enumerate(data):
    t_idx = i % n_devices
    thread = threads[t_idx]
    if thread is not None:
        output = thread.join()
        results.append(output)

    # see https://stackoverflow.com/q/6893968/688080
    thread = ThreadWithReturnValue(target=workers[t_idx], args=(image,))

    thread.start()
    threads[t_idx] = thread

for thread in threads:
    output = thread.join()
    results.append(output)

for sess in sessions:
    sess.close()

但是,似乎只有最后一个设备在执行此工作。而且,如果按如下所示将代码更改为单线程,则可以看到GPU被交替占用,这表明设备分配正确:

# after import the graph into the devices
# x: input tensor, y: output tensor
sessions = [tf.Session(graph=g, 
    config=tf.ConfigProto(log_device_placement=True)) for g in graphs]
result = [None] * len(data)
for i, image in enumerate(data):
    sess = sessions[i % n_devices]
    result[i] = sess.run(y, feed_dict={x: image})

那么在冻结图的推理阶段如何纠正代码或进行数据并行处理的正确方法是什么?

0 个答案:

没有答案