Question

The following code demonstrates the difference of the computational time with respect to three syntax forms.

import numpy as np

a = np.random.randn(10000000)
b = np.zeros(a.shape)

np.sin(a, out=b, where=False)
# 100 loops, best of 3: 6.61 ms per loop
b = np.sin(a)
# 10 loops, best of 3: 162 ms per loop
np.sin(a, out=b)
# 10 loops, best of 3: 146 ms per loop

I would like to use the syntax with that provides the minimal computation time. My question is: why if I define out=b, the default value for the where argument is still True. Is there a way to avoid it? It really makes the code more complicated.

Answer 1

Have you looked at the output of np.sin(a, out=b, where=False)?

a = np.linspace(0, 2*np.pi, 10)
# a: array([0.        , 0.6981317 , 1.3962634 , 2.0943951 , 2.7925268 ,
#           3.4906585 , 4.1887902 , 4.88692191, 5.58505361, 6.28318531])

b = np.zeros_like(a)  # [0, 0, 0, ...]

np.sin(a, out=b, where=False)
# --> b: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

It's all zero, because values of False means "don't calculate here". In this case False means don't calculate for the entire array. That's why it's so fast!

https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.sin.html

where works like this

x = np.ones(3)  # [1,1,1]

np.divide(x, 2, where=[True, False, True])
# --> array([0.5, 1. , 0.5])

so we can say we only want to apply the function in some places.

out simply says that we will store the result in a pre-allocated array. This allows us to save memory np.log(x, out=x) # only use x or to save on array creation time (lets say we're doing many calculations in a loop).

The difference is that b = np.log(a) is effectively:

__temp = np.empty(a.shape)  # temporary hidden array
np.log(a, out=__temp)
b = __temp  # includes dereferencing old b -- no advantage to initialising b before
del __temp

whereas using out skips creating the temporary array, so is slightly faster.

On a side note I think that allowing False as a value is a bit silly since why would you ever want to not calculate the function anywhere?

why the where argument default value is false?

1 个答案: