为了提高效率,我只想计算低于阈值的张量的平方根。
例如,在numpy中,我有
aria-hidden
如果我用口罩
import numpy as np
x = np.random.random(size=(10e6))
%timeit np.sqrt(x)
-> 10 ms ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
计算速度比预期的要快,因为numpy似乎仅计算元素x <1e-3的sqrt。
但是,在Tensorflow中,我无法完成这项工作:
x_m = x[x < 1e-3]
%timeit np.sqrt(x_m)
-> 8.94 µs ± 20.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
如果我现在尝试使用boolean_mask
import tensorflow as tf
tf.InteractiveSession()
x_tf = tf.constant(x)
%timeit tf.sqrt(x_tf).eval()
-> 314 ms ± 1.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
没有像numpy版本那样加速。看来Tensorflow中的sqrt仍然是针对原始Tensor x_tf的所有值计算的。
是否有一种仅对掩码值运行操作(如sqrt)的方法?还是从蒙版张量中提取较短的张量?
答案 0 :(得分:0)
您的措施有两个问题:
这些应该是更具代表性的时机:
import numpy as np
import tensorflow as tf
np.random.seed(0)
x = np.random.random(size=int(10e6))
%timeit np.sqrt(x)
# 20.4 ms ± 581 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.sqrt(x[x < 1e-3])
# 9.96 ms ± 91 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
with tf.Graph().as_default(), tf.Session():
x_tf = tf.constant(x)
x_tf_sqrt = tf.sqrt(x_tf)
%timeit x_tf_sqrt.eval()
# 16.8 ms ± 685 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
mask = tf.boolean_mask(x_tf, x_tf < 1e-3)
mask_sqrt = tf.sqrt(mask)
%timeit mask_sqrt.eval()
# 103 µs ± 43.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)