我正在做类似代码的事情,我对np.roll()函数的性能不满意。我总结了baseArray和otherArray,其中baseArray在每次迭代中由一个元素滚动。但是当我滚动它时我不需要baseArray的副本,我宁愿选择一个视图,例如当我将baseArray与其他数组相加并且如果baseArray被滚动两次时,则basearray的第二个元素与第0个元素相加otherArray,baseArray的第3个元素与otherArray等的第1个元素相加。
即。实现与np.roll()相同的结果,但不复制数组。
import numpy as np
from numpy import random
import cProfile
def profile():
baseArray = np.zeros(1000000)
for i in range(1000):
baseArray= np.roll(baseArray,1)
otherArray= np.random.rand(1000000)
baseArray=baseArray+otherArray
cProfile.run('profile()')
输出(注意第3行 - 滚动功能):
9005 function calls in 26.741 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 5.123 5.123 26.740 26.740 <ipython-input-101-9006a6c0d2e3>:5(profile)
1 0.001 0.001 26.741 26.741 <string>:1(<module>)
1000 0.237 0.000 8.966 0.009 numeric.py:1327(roll)
1000 0.004 0.000 0.005 0.000 numeric.py:476(asanyarray)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1000 12.650 0.013 12.650 0.013 {method 'rand' of 'mtrand.RandomState' objects}
1000 0.005 0.000 0.005 0.000 {method 'reshape' of 'numpy.ndarray' objects}
1000 6.390 0.006 6.390 0.006 {method 'take' of 'numpy.ndarray' objects}
2000 1.345 0.001 1.345 0.001 {numpy.core.multiarray.arange}
1000 0.001 0.000 0.001 0.000 {numpy.core.multiarray.array}
1000 0.985 0.001 0.985 0.001 {numpy.core.multiarray.concatenate}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.zeros}
1 0.000 0.000 0.000 0.000 {range}
答案 0 :(得分:2)
我很确定无法避免副本due to the way in which numpy arrays are represented internally。数组由连续的内存地址块和一些元数据组成,这些元数据包括数组维度,项目大小以及每个维度的元素之间的分隔(&#34; stride&#34;)。 &#34;滚动&#34;向前或向后的每个元素都需要沿同一维度具有不同的长度步幅,这是不可能的。
也就是说,您可以使用切片索引来避免复制baseArray
中除一个元素之外的所有元素:
import numpy as np
def profile1(seed=0):
gen = np.random.RandomState(seed)
baseArray = np.zeros(1000000)
for i in range(1000):
baseArray= np.roll(baseArray,1)
otherArray= gen.rand(1000000)
baseArray=baseArray+otherArray
return baseArray
def profile2(seed=0):
gen = np.random.RandomState(seed)
baseArray = np.zeros(1000000)
for i in range(1000):
otherArray = gen.rand(1000000)
tmp1 = baseArray[:-1] # view of the first n-1 elements
tmp2 = baseArray[-1] # copy of the last element
baseArray[1:]=tmp1+otherArray[1:] # write the last n-1 elements
baseArray[0]=tmp2+otherArray[0] # write the first element
return baseArray
这些将得到相同的结果:
In [1]: x1 = profile1()
In [2]: x2 = profile2()
In [3]: np.allclose(x1, x2)
Out[3]: True
在实践中,性能没有太大差异:
In [4]: %timeit profile1()
1 loop, best of 3: 23.4 s per loop
In [5]: %timeit profile2()
1 loop, best of 3: 17.3 s per loop
答案 1 :(得分:0)
我的功能profile3()
的速度提高了四倍。在累积期间,它使用带有递增移位的切片索引,而不是任何滚动。循环之后,单步滚动1000个元素将产生与其他功能相同的对齐方式。
import numpy as np
from timeit import timeit
def profile1(seed=0):
gen = np.random.RandomState(seed)
otherArray= gen.rand(1000000) # outside the loop after Marcel's comment above
baseArray = np.zeros(1000000)
for i in range(1000):
baseArray= np.roll(baseArray,1)
baseArray=baseArray+otherArray
return baseArray
def profile2(seed=0):
gen = np.random.RandomState(seed)
otherArray= gen.rand(1000000)
baseArray = np.zeros(1000000)
for i in range(1000):
tmp1 = baseArray[:-1] # view of the first n-1 elements
tmp2 = baseArray[-1] # copy of the last element
baseArray[1:]=tmp1+otherArray[1:] # write the last n-1 elements
baseArray[0]=tmp2+otherArray[0] # write the first element
return baseArray
def profile3(seed=0):
gen = np.random.RandomState(seed)
otherArray= gen.rand(1000000)
baseArray = np.zeros(1000000)
for i in range(1,1001): # use % or itertools.cycle if range > shape
baseArray[:-i] += otherArray[i:]
baseArray[-i:] += otherArray[:i]
return np.roll(baseArray,1000)
print(timeit(profile1,number=1)) # 7.0
print(timeit(profile2,number=1)) # 4.7
print(timeit(profile3,number=1)) # 1.2
x2 = profile2()
x3 = profile3()
print(np.allclose(x2, x3)) # True