我在几个for
循环中多次使用numpy的函数,但它变得太慢了。有没有办法更快地执行此功能?我读过你应该尝试在线循环,以及在for
循环之前为函数创建局部变量,但似乎没有什么能提高速度(<1%)。 len(UNIQ_IDS)
~800。emiss_data
和obj_data
是numpy ndarrays,其shape =(2600,5200)。我已经使用import profile
来处理瓶颈所在的位置,并且where
循环中的for
是一个很大的问题。
import numpy as np
max = np.max
where = np.where
MAX_EMISS = [max(emiss_data[where(obj_data == i)]) for i in UNIQ_IDS)]
答案 0 :(得分:7)
事实证明,在这种情况下,纯Python循环可以比NumPy索引(或调用np.where)快得多。
考虑以下备选方案:
import numpy as np
import collections
import itertools as IT
shape = (2600,5200)
# shape = (26,52)
emiss_data = np.random.random(shape)
obj_data = np.random.random_integers(1, 800, size=shape)
UNIQ_IDS = np.unique(obj_data)
def using_where():
max = np.max
where = np.where
MAX_EMISS = [max(emiss_data[where(obj_data == i)]) for i in UNIQ_IDS]
return MAX_EMISS
def using_index():
max = np.max
MAX_EMISS = [max(emiss_data[obj_data == i]) for i in UNIQ_IDS]
return MAX_EMISS
def using_max():
MAX_EMISS = [(emiss_data[obj_data == i]).max() for i in UNIQ_IDS]
return MAX_EMISS
def using_loop():
result = collections.defaultdict(list)
for val, idx in IT.izip(emiss_data.ravel(), obj_data.ravel()):
result[idx].append(val)
return [max(result[idx]) for idx in UNIQ_IDS]
def using_sort():
uind = np.digitize(obj_data.ravel(), UNIQ_IDS) - 1
vals = uind.argsort()
count = np.bincount(uind)
start = 0
end = 0
out = np.empty(count.shape[0])
for ind, x in np.ndenumerate(count):
end += x
out[ind] = np.max(np.take(emiss_data, vals[start:end]))
start += x
return out
def using_split():
uind = np.digitize(obj_data.ravel(), UNIQ_IDS) - 1
vals = uind.argsort()
count = np.bincount(uind)
return [np.take(emiss_data, item).max()
for item in np.split(vals, count.cumsum())[:-1]]
for func in (using_index, using_max, using_loop, using_sort, using_split):
assert using_where() == func()
这是基准测试,shape = (2600,5200)
:
In [57]: %timeit using_loop()
1 loops, best of 3: 9.15 s per loop
In [90]: %timeit using_sort()
1 loops, best of 3: 9.33 s per loop
In [91]: %timeit using_split()
1 loops, best of 3: 9.33 s per loop
In [61]: %timeit using_index()
1 loops, best of 3: 63.2 s per loop
In [62]: %timeit using_max()
1 loops, best of 3: 64.4 s per loop
In [58]: %timeit using_where()
1 loops, best of 3: 112 s per loop
因此using_loop
(纯Python)的速度比using_where
快11倍。
我不完全确定为什么纯Python比NumPy更快。我的猜测是纯Python版本通过两个数组拉链(是的,双关语)。它充分利用了这样一个事实:尽管所有的花式索引,我们真的只想访问每个值。因此,它必须逐步解决问题,即必须准确确定emiss_data
中每个值落入哪个组。但这只是模糊的推测。我不知道在我进行基准测试之前它会更快。
答案 1 :(得分:7)
np.unique
可以使用return_index
:
def using_sort():
#UNIQ_IDS,uind=np.unique(obj_data, return_inverse=True)
uind= np.digitize(obj_data.ravel(), UNIQ_IDS) - 1
vals=uind.argsort()
count=np.bincount(uind)
start=0
end=0
out=np.empty(count.shape[0])
for ind,x in np.ndenumerate(count):
end+=x
out[ind]=np.max(np.take(emiss_data,vals[start:end]))
start+=x
return out
使用@ unutbu的答案作为shape = (2600,5200)
的基线:
np.allclose(using_loop(),using_sort())
True
%timeit using_loop()
1 loops, best of 3: 12.3 s per loop
#With np.unique inside the definition
%timeit using_sort()
1 loops, best of 3: 9.06 s per loop
#With np.unique outside the definition
%timeit using_sort()
1 loops, best of 3: 2.75 s per loop
#Using @Jamie's suggestion for uind
%timeit using_sort()
1 loops, best of 3: 6.74 s per loop
答案 2 :(得分:5)
我认为实现此目的的最快方法是使用groupby()
包中的pandas
操作。与@ Ophion的using_sort()
函数相比,Pandas的速度提高了10倍:
import numpy as np
import pandas as pd
shape = (2600,5200)
emiss_data = np.random.random(shape)
obj_data = np.random.random_integers(1, 800, size=shape)
UNIQ_IDS = np.unique(obj_data)
def using_sort():
#UNIQ_IDS,uind=np.unique(obj_data, return_inverse=True)
uind= np.digitize(obj_data.ravel(), UNIQ_IDS) - 1
vals=uind.argsort()
count=np.bincount(uind)
start=0
end=0
out=np.empty(count.shape[0])
for ind,x in np.ndenumerate(count):
end+=x
out[ind]=np.max(np.take(emiss_data,vals[start:end]))
start+=x
return out
def using_pandas():
return pd.Series(emiss_data.ravel()).groupby(obj_data.ravel()).max()
print('same results:', np.allclose(using_pandas(), using_sort()))
# same results: True
%timeit using_sort()
# 1 loops, best of 3: 3.39 s per loop
%timeit using_pandas()
# 1 loops, best of 3: 397 ms per loop
答案 3 :(得分:2)
你不能这样做吗
emiss_data[obj_data == i]
?我不确定你为什么要使用where
。
答案 4 :(得分:0)
根据Are tuples more efficient than lists in Python?,分配元组比分配列表快得多,所以可能只是通过构建元组而不是列表,这将提高效率。
答案 5 :(得分:0)
如果func ifCollidedWith() {
if bird.touchesBegan(UITouch, withEvent: pipePair){
}
}
由相对较小的整数组成,则可以使用obj_data
(自v1.8.0起):
numpy.maximum.at