这个问题源于查看有关计算 this 数量的zero crossings问题的答案。提供了几个解决问题的答案,但NumPy appproach在时间上摧毁了其他人。
当我比较四个答案时,我注意到NumPy解决方案为大序列提供了不同的结果。有问题的四个答案是loop and simple generator,better generator expression和NumPy solution。
问题:为什么NumPy解决方案提供的结果与其他三种不同?(哪个是正确的?)
以下是计算过零次数的结果:
Blazing fast NumPy solution
total time: 0.303605794907 sec
Zero Crossings Small: 8
Zero Crossings Med: 54464
Zero Crossings Big: 5449071
Loop solution
total time: 15.6818780899 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847
Simple generator expression solution
total time: 16.3374049664 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847
Modified generator expression solution
total time: 13.6596589088 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847
用于获得结果的代码:
import time
import numpy as np
def zero_crossings_loop(sequence):
s = 0
for ind, _ in enumerate(sequence):
if ind+1 < len(sequence):
if sequence[ind]*sequence[ind+1] < 0:
s += 1
return s
def print_three_results(r1, r2, r3):
print 'Zero Crossings Small:', r1
print 'Zero Crossings Med:', r2
print 'Zero Crossings Big:', r3
print '\n'
small = [80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3, -95.3, 89.2, -154.1, 121.4, -85.1, 96.8, 68.2]
med = np.random.randint(-10, 10, size=100000)
big = np.random.randint(-10, 10, size=10000000)
print 'Blazing fast NumPy solution'
tic = time.time()
z1 = (np.diff(np.sign(small)) != 0).sum()
z2 = (np.diff(np.sign(med)) != 0).sum()
z3 = (np.diff(np.sign(big)) != 0).sum()
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)
print 'Loop solution'
tic = time.time()
z1 = zero_crossings_loop(small)
z2 = zero_crossings_loop(med)
z3 = zero_crossings_loop(big)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)
print 'Simple generator expression solution'
tic = time.time()
z1 = sum(1 for i, _ in enumerate(small) if (i+1 < len(small)) if small[i]*small[i+1] < 0)
z2 = sum(1 for i, _ in enumerate(med) if (i+1 < len(med)) if med[i]*med[i+1] < 0)
z3 = sum(1 for i, _ in enumerate(big) if (i+1 < len(big)) if big[i]*big[i+1] < 0)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)
print 'Modified generator expression solution'
tic = time.time()
z1 = sum(1 for i in xrange(1, len(small)) if small[i-1]*small[i] < 0)
z2 = sum(1 for i in xrange(1, len(med)) if med[i-1]*med[i] < 0)
z3 = sum(1 for i in xrange(1, len(big)) if big[i-1]*big[i] < 0)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)
答案 0 :(得分:5)
您的解决方案的零处理方式不同。 numpy.diff解决方案仍将返回从-1到0或1到0的差异,将其计为零交叉,而您的迭代解决方案则不会,因为它们使用小于零的乘积作为其标准。相反,测试<= 0
,数字将是等效的。
答案 1 :(得分:4)
我得到与循环相同的结果:
((array[:-1] * array[1:]) < 0).sum()
此:
small = np.array([80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3,
-95.3, 89.2, -154.1, 121.4, -85.1, 96.8, 68.2])
med = np.random.randint(-10, 10, size=100000)
big = np.random.randint(-10, 10, size=10000000)
for name, array in [('small', small), ('med', med), ('big', big)]:
print('loop ', name, zero_crossings_loop(array))
print('Numpy', name, ((array[:-1] * array[1:]) < 0).sum())
打印:
loop small 8
Numpy small 8
loop med 44901
Numpy med 44901
loop big 4496911
Numpy big 4496911
<强> UDPATE 强>
此版本避免了零问题:
def numpy_zero_crossings2(array):
nonzero_array = array[np.nonzero(array)]
return ((nonzero_array[:-1] * nonzero_array[1:]) < 0).sum()
它给出了与@djsutton的答案相同的结果:
>>> numpy_zero_crossings2(big) == numpy_zero_crossings(big)
True
但看起来有点快:
%timeit numpy_zero_crossings2(big)
1 loops, best of 3: 194 ms per loop
VS
%timeit numpy_zero_crossings(big)
1 loops, best of 3: 227 ms per loop
答案 2 :(得分:3)
当数据元素等于零时,迭代和numpy解决方案在计算交叉时表现不佳。对于数据[1,0,-1],迭代解决方案给出了0个交叉点,而numpy解决方案给出了2个交叉点,这两个交叉点似乎都不正确。
一种解决方案是将数据元素丢弃为零。在NumPy中你可能会尝试像
这样的东西def numpy_zero_crossings(data):
return (np.diff(np.sign(data)[np.nonzero(data)]) != 0).sum()
然而,这引入了数组的另一次迭代,因此它会增加另一个O(n)的运行时间