我在tensorflow中编写了一小段代码,用数据集API测试其多GPU性能。
import tensorflow as tf
import numpy as np
dataset = tf.data.Dataset.from_tensor_slices([[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]])
print(dataset.output_types)
print(dataset.output_shapes)
iterator = dataset.make_initializable_iterator()
#next_element = iterator.get_next()
tensor_results = []
for i in range(2):
for j in range(2):
with tf.device("/gpu:%d" % j):
with tf.name_scope("Tower_%d" % j) as scope:
operand = iterator.get_next()
tensor_result = tf.matmul(tf.reshape(operand,shape=[1,4]), tf.reshape(operand,shape=[4,1]))
tensor_results.append(tensor_result)
tfconfig = tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)
tfconfig.gpu_options.allow_growth=True
sess = tf.Session(config=tfconfig)
sess.run(iterator.initializer)
results = sess.run(tensor_results)
results
如果你运行它,输出是
[array([[16]], dtype=int32), array([[4]], dtype=int32), array([[36]], dtype=int32), array([[64]], dtype=int32)]
这是乱序的,每次我运行它都会有所不同。它应该是这样的:
[array([[4]], dtype=int32), array([[16]], dtype=int32), array([[36]], dtype=int32), array([[64]], dtype=int32)]
我想原因是因为GPU并行性(异步计算),但我不确定......有谁能告诉我原因以及如何解决这个问题?(同时仍保持GPU级并行性)谢谢!< / p>