我正在使用Python 2.7。 我有两个数组,A和B. 为了找到B中存在的A中元素的索引,我可以做
A_inds = np.in1d(A,B)
我还想得到A中存在的B中元素的索引,即使用上面的代码找到的相同重叠元素的B中的索引。
目前我再次运行相同的行,如下所示:
B_inds = np.in1d(B,A)
但这个额外的计算似乎应该是不必要的。是否有更高效的计算方法来同时获得A_inds
和B_inds
?
我愿意使用列表或数组方法。
答案 0 :(得分:3)
np.unique
和np.searchsorted
可以一起使用来解决它 -
def unq_searchsorted(A,B):
# Get unique elements of A and B and the indices based on the uniqueness
unqA,idx1 = np.unique(A,return_inverse=True)
unqB,idx2 = np.unique(B,return_inverse=True)
# Create mask equivalent to np.in1d(A,B) and np.in1d(B,A) for unique elements
mask1 = (np.searchsorted(unqB,unqA,'right') - np.searchsorted(unqB,unqA,'left'))==1
mask2 = (np.searchsorted(unqA,unqB,'right') - np.searchsorted(unqA,unqB,'left'))==1
# Map back to all non-unique indices to get equivalent of np.in1d(A,B),
# np.in1d(B,A) results for non-unique elements
return mask1[idx1],mask2[idx2]
运行时测试并验证结果 -
In [233]: def org_app(A,B):
...: return np.in1d(A,B), np.in1d(B,A)
...:
In [234]: A = np.random.randint(0,10000,(10000))
...: B = np.random.randint(0,10000,(10000))
...:
In [235]: np.allclose(org_app(A,B)[0],unq_searchsorted(A,B)[0])
Out[235]: True
In [236]: np.allclose(org_app(A,B)[1],unq_searchsorted(A,B)[1])
Out[236]: True
In [237]: %timeit org_app(A,B)
100 loops, best of 3: 7.69 ms per loop
In [238]: %timeit unq_searchsorted(A,B)
100 loops, best of 3: 5.56 ms per loop
如果两个输入数组已经是sorted
和unique
,那么性能提升将是巨大的。因此,解决方案功能将简化为 -
def unq_searchsorted_v1(A,B):
out1 = (np.searchsorted(B,A,'right') - np.searchsorted(B,A,'left'))==1
out2 = (np.searchsorted(A,B,'right') - np.searchsorted(A,B,'left'))==1
return out1,out2
后续运行时测试 -
In [275]: A = np.random.randint(0,100000,(20000))
...: B = np.random.randint(0,100000,(20000))
...: A = np.unique(A)
...: B = np.unique(B)
...:
In [276]: np.allclose(org_app(A,B)[0],unq_searchsorted_v1(A,B)[0])
Out[276]: True
In [277]: np.allclose(org_app(A,B)[1],unq_searchsorted_v1(A,B)[1])
Out[277]: True
In [278]: %timeit org_app(A,B)
100 loops, best of 3: 8.83 ms per loop
In [279]: %timeit unq_searchsorted_v1(A,B)
100 loops, best of 3: 4.94 ms per loop
答案 1 :(得分:1)
简单的多处理实现可以让您获得更快的速度:
import time
import numpy as np
from multiprocessing import Process, Queue
a = np.random.randint(0, 20, 1000000)
b = np.random.randint(0, 20, 1000000)
def original(a, b, q):
q.put( np.in1d(a, b) )
if __name__ == '__main__':
t0 = time.time()
q = Queue()
q2 = Queue()
p = Process(target=original, args=(a, b, q,))
p2 = Process(target=original, args=(b, a, q2))
p.start()
p2.start()
res = q.get()
res2 = q2.get()
print time.time() - t0
>>> 0.21398806572
Divakar的unq_searchsorted(A,B)
方法在我的机器上耗时0.271834135056秒。