更新:通过在工作人员的参数中加入x
,y
和sess
来解决。
原始问题:
我有两个GPU,并计划为一个GPU创建一个线程以实现(数据)并行性。我的尝试:
import tensorflow as tf
gpus = ['/device:GPU:0', '/device:GPU:1']
n_devices = len(gpus)
graphs = []
for gpu in gpus:
with tf.device(gpu):
with tf.gfile.GFile(graph_def_pb_path, "rb") as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
for node in graph_def.node:
node.device = gpu
with tf.Graph().as_default() as g:
tf.import_graph_def(graph_def)
graphs.append(g)
xs = [g.get_tensor_by_name(<input_tensor_name>) for g in graphs]
ys = [g.get_tensor_by_name(<output_tensor_name>) for g in graphs]
sessions = [tf.Session(graph=g,
config=tf.ConfigProto(log_device_placement=True)) for g in graphs]
workers = [lambda x_val: sess.run(y, feed_dict={x: x_val})
for sess, x, y in zip(sessions, xs, ys)]
n_devices = len(graphs)
results = []
threads = [None] * n_devices
for i, image in enumerate(data):
t_idx = i % n_devices
thread = threads[t_idx]
if thread is not None:
output = thread.join()
results.append(output)
# see https://stackoverflow.com/q/6893968/688080
thread = ThreadWithReturnValue(target=workers[t_idx], args=(image,))
thread.start()
threads[t_idx] = thread
for thread in threads:
output = thread.join()
results.append(output)
for sess in sessions:
sess.close()
但是,似乎只有最后一个设备在执行此工作。而且,如果按如下所示将代码更改为单线程,则可以看到GPU被交替占用,这表明设备分配正确:
# after import the graph into the devices
# x: input tensor, y: output tensor
sessions = [tf.Session(graph=g,
config=tf.ConfigProto(log_device_placement=True)) for g in graphs]
result = [None] * len(data)
for i, image in enumerate(data):
sess = sessions[i % n_devices]
result[i] = sess.run(y, feed_dict={x: image})
那么在冻结图的推理阶段如何纠正代码或进行数据并行处理的正确方法是什么?