The following code demonstrates the difference of the computational time with respect to three syntax forms.
import numpy as np
a = np.random.randn(10000000)
b = np.zeros(a.shape)
np.sin(a, out=b, where=False)
# 100 loops, best of 3: 6.61 ms per loop
b = np.sin(a)
# 10 loops, best of 3: 162 ms per loop
np.sin(a, out=b)
# 10 loops, best of 3: 146 ms per loop
I would like to use the syntax with that provides the minimal computation time.
My question is: why if I define out=b
, the default value for the where argument is still True. Is there a way to avoid it? It really makes the code more complicated.
答案 0 :(得分:3)
Have you looked at the output of np.sin(a, out=b, where=False)
?
a = np.linspace(0, 2*np.pi, 10)
# a: array([0. , 0.6981317 , 1.3962634 , 2.0943951 , 2.7925268 ,
# 3.4906585 , 4.1887902 , 4.88692191, 5.58505361, 6.28318531])
b = np.zeros_like(a) # [0, 0, 0, ...]
np.sin(a, out=b, where=False)
# --> b: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
It's all zero, because values of False
means "don't calculate here". In this case False
means don't calculate for the entire array. That's why it's so fast!
https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.sin.html
where
works like this
x = np.ones(3) # [1,1,1]
np.divide(x, 2, where=[True, False, True])
# --> array([0.5, 1. , 0.5])
so we can say we only want to apply the function in some places.
out
simply says that we will store the result in a pre-allocated array. This allows us to save memory np.log(x, out=x) # only use x
or to save on array creation time (lets say we're doing many calculations in a loop).
The difference is that b = np.log(a)
is effectively:
__temp = np.empty(a.shape) # temporary hidden array
np.log(a, out=__temp)
b = __temp # includes dereferencing old b -- no advantage to initialising b before
del __temp
whereas using out
skips creating the temporary array, so is slightly faster.
On a side note I think that allowing False
as a value is a bit silly since why would you ever want to not calculate the function anywhere?