我正在尝试通过编码一些简单的问题来学习张量流:我试图使用直接采样蒙特卡罗方法找到pi的值。
使用for loop
执行此操作时,运行时间比我想象的要长得多。我已经看过其他关于类似事情的帖子,我试图按照解决方案,但我认为我仍然必须做错事。
以下是我的代码:
import tensorflow as tf
import numpy as np
import time
n_trials = 50000
tf.reset_default_graph()
x = tf.random_uniform(shape=(), name='x')
y = tf.random_uniform(shape=(), name='y')
r = tf.sqrt(x**2 + y**2)
hit = tf.Variable(0, name='hit')
# perform the monte carlo step
is_inside = tf.cast(tf.less(r, 1), tf.int32)
hit_op = hit.assign_add(is_inside)
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
# Make sure no new nodes are added to the graph
sess.graph.finalize()
start = time.time()
# Run monte carlo trials -- This is very slow
for _ in range(n_trials):
sess.run(hit_op)
hits = hit.eval()
print("Pi is {}".format(4*hits/n_trials))
print("Tensorflow operation took {:.2f} s".format((time.time()-start)))
>>> Pi is 3.15208
>>> Tensorflow operation took 8.98 s
相比之下,在numpy中执行for loop
类型解决方案要快一个数量级
start = time.time()
hits = [ 1 if np.sqrt(np.sum(np.square(np.random.uniform(size=2)))) < 1 else 0 for _ in range(n_trials) ]
a = 0
for hit in hits:
a+=hit
print("numpy operation took {:.2f} s".format((time.time()-start)))
print("Pi is {}".format(4*a/n_trials))
>>> Pi is 3.14032
>>> numpy operation took 0.75 s
下面附有各种试验的总体执行时间差异图。
请注意:我的问题不是“如何最快地执行此任务”,我认识到有更有效的方法来计算Pi。我只使用它作为基准测试工具来检查张量流对我熟悉的东西(numpy)的性能。
答案 0 :(得分:1)
简单,session.run有很多开销,并且它不是设计用于那种方式。通常,例如一个神经网络,你会调用一个session.run进行大量矩阵的十几次乘法运算,然后这个0.2 ms就完全无关紧要了。 至于你的情况,你可能想要这样的东西。它比我机器上的numpy版本快5倍。
顺便说一下,你在numpy中完成同样的事情。如果使用loop来减少np.sum而不是np.sum则会慢得多。
import tensorflow as tf
import numpy as np
import time
n_trials = 50000
tf.reset_default_graph()
x = tf.random_uniform(shape=(n_trials,), name='x')
y = tf.random_uniform(shape=(), name='y')
r = tf.sqrt(x**2 + y**2)
hit = tf.Variable(0, name='hit')
# perform the monte carlo step
is_inside = tf.cast(tf.less(r, 1), tf.int32)
hit2= tf.reduce_sum(is_inside)
#hit_op = hit.assign_add(is_inside)
with tf.Session() as sess:
# init_op = tf.global_variables_initializer()
sess.run(tf.initialize_all_variables())
# Make sure no new nodes are added to the graph
sess.graph.finalize()
start = time.time()
# Run monte carlo trials -- This is very slow
#for _ in range(n_trials):
sess.run(hit2)
hits = hit2.eval()
print("Pi is {}".format(4*hits/n_trials))
print("Tensorflow operation took {:.2f} s".format((time.time()-start)))
答案 1 :(得分:1)
速度缓慢与sess.run
中Python和Tensorflow之间的一些通信开销有关,它在循环内执行多次。我建议使用tf.while_loop
来执行Tensorflow中的计算。这比numpy
更好。
import tensorflow as tf
import numpy as np
import time
n_trials = 50000
tf.reset_default_graph()
hit = tf.Variable(0, name='hit')
def body(ctr):
x = tf.random_uniform(shape=[2], name='x')
r = tf.sqrt(tf.reduce_sum(tf.square(x))
is_inside = tf.cond(tf.less(r,1), lambda: tf.constant(1), lambda: tf.constant(0))
hit_op = hit.assign_add(is_inside)
with tf.control_dependencies([hit_op]):
return ctr + 1
def condition(ctr):
return ctr < n_trials
with tf.Session() as sess:
tf.global_variables_initializer().run()
result = tf.while_loop(condition, body, [tf.constant(0)])
start = time.time()
sess.run(result)
hits = hit.eval()
print("Pi is {}".format(4.*hits/n_trials))
print("Tensorflow operation took {:.2f} s".format((time.time()-start)))