我有两个列表,其中一个是大量的(数百万个元素),另外几个。我想做以下
bigArray=[0,1,0,2,3,2,,.....]
smallArray=[0,1,2,3,4]
for i in len(smallArray):
pts=np.where(bigArray==smallArray[i])
#Do stuff with pts...
上述工作,但很慢。有没有办法在不诉诸C语言的情况下更有效地做到这一点?
答案 0 :(得分:8)
在您的情况下,您可以从预先分配您的大阵列中受益。下面的示例演示如何将时间从约45秒减少到2秒(在我的笔记本电脑上)(对于阵列5e6与1e3的一组特定长度)。显然,如果阵列大小差别很大,那么解决方案将不是最佳选择。例如。使用默认解决方案,复杂度为O(bigN * smallN),但对于我建议的解决方案,它是O((bigN + smallN)* log(bigN))
import numpy as np, numpy.random as nprand, time, bisect
bigN = 5e6
smallN = 1000
maxn = 1e7
nprand.seed(1)
bigArr = nprand.randint(0, maxn, size=bigN)
smallArr = nprand.randint(0, maxn, size=smallN)
# brute force
t1 = time.time()
for i in range(len(smallArr)):
inds = np.where(bigArr == smallArr[i])[0]
t2 = time.time()
print "Brute", t2-t1
# not brute force (like nested loop with index scan)
t1 = time.time()
sortedind = np.argsort(bigArr)
sortedbigArr = bigArr[sortedind]
for i in range(len(smallArr)):
i1 = bisect.bisect_left(sortedbigArr, smallArr[i])
i2 = bisect.bisect_right(sortedbigArr, smallArr[i])
inds = sortedind[i1:i2]
t2=time.time()
print "Non-brute", t2-t1
输出:
Brute 42.5278530121
非暴力1.57193303108
答案 1 :(得分:3)
Numpy提供函数numpy.searchsorted:http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.searchsorted.html
示例:
>>> import numpy as np
>>> sorted = np.argsort(big_list)
>>> r = np.searchsorted(big_list, small_list, side='right',sorter=sorted)
>>> l = np.searchsorted(big_list, small_list, side='left',sorter=sorted)
>>> for b, e in zip(l, r):
... inds = sorted[b:e]
答案 2 :(得分:2)
到目前为止,我认为没有必要为numpy;你可以使用defaultdict
,只要记忆力充足,如果观察次数不是太多,就应该这样。
big_list = [0,1,0,2,3,2,5,6,7,5,6,4,5,3,4,3,5,6,5]
small_list = [0,1,2,3,4]
from collections import defaultdict
dicto = defaultdict(list) #dictionary stores all the relevant coordinates
#so you don't have to search for them later
for ind, ele in enumerate(big_list):
dicto[ele].append(ind)
结果:
>>> for ele in small_list:
... print dicto[ele]
...
[0, 2]
[1]
[3, 5]
[4, 13, 15]
[11, 14]
这应该会给你一些速度。