我正在尝试使用tensorflow map_fn进行并行计算。但是在我看来,性能提升并不明显。
以下是在Ubuntu 14.04 LTS上运行Python 3.6.5,Tensorflow版本1.12.0、28个双核(Intel(R)Xeon(R)CPU E5-2697 v3 @ 2.60GHz)= 56处理器的示例代码
在Amazon AWS SagerMaker ml-p3-xlarge上运行的相同代码甚至花费了更长的时间,即227秒。
import tensorflow as tf
import time
# version 1
tic = time.time()
elems = np.array(range(1,1000000), dtype=np.float64)
output = tf.map_fn(lambda x: x**6 , elems, dtype=tf.float64, parallel_iterations=56)
sess = tf.Session()
res = sess.run(output)
toc = time.time() - tic
print("elapsed=", toc) # 29.47 (seconds)
# version 2
tic = time.time()
elems = np.array(range(1,1000000), dtype=np.float64)
output = tf.map_fn(lambda x: x**6 , elems, dtype=tf.float64, parallel_iterations=56)
n_cpus=28
with tf.Session(
config=tf.ConfigProto(log_device_placement=True,
device_count={ "CPU": n_cpus },
inter_op_parallelism_threads=n_cpus,
intra_op_parallelism_threads=1,
))as sess:
res = sess.run(输出)
toc = time.time() - tic
print("elapsed=", toc) # 29.26 (seconds)
# version 3
tic = time.time()
elems = np.array(range(1,1000000), dtype=np.float64)
x6 = [ x**6 for x in elems]
toc = time.time() - tic
print("elapsed time=", toc) # 0.5 seconds
以上代码有什么问题?如果不使用map_fn,则顺序执行版本3仅需0.5(秒)。