我不仅需要值,还需要一个numpy数组中元素的位置,这些元素也出现在第二个numpy数组中,我也需要第二个数组中的位置。
以下是我能够做到的最好的例子:
>>> a=np.arange(0.,15.)
>>> a
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.,
11., 12., 13., 14.])
>>> b=np.arange(4.,8.,.5)
>>> b
array([ 4. , 4.5, 5. , 5.5, 6. , 6.5, 7. , 7.5])
>>> [ (i,j) for (i,alem) in enumerate(a) for (j,blem) in enumerate(b) if alem==blem]
[(4, 0), (5, 2), (6, 4), (7, 6)]
任何人都有更快,更具特色或更“pythonic”的东西吗?
答案 0 :(得分:3)
这是一个O((n + k)log(n + k))(朴素算法是O(nk))解np.unique
uniq, inv = np.unique(np.r_[a, b], return_inverse=True)
map = -np.ones((len(uniq),), dtype=int)
map[inv[:len(a)]] = np.arange(len(a))
bina = map[inv[len(a):]]
inds_in_b = np.where(bina != -1)[0]
elements, inds_in_a = b[inds_in_b], bina[inds_in_b]
或者您可以简单地将a
排序为O((n + k)log(k))
inds = np.argsort(a)
aso = a[inds]
bina = np.searchsorted(aso[:-1], b)
inds_in_b = np.where(b == aso[bina])[0]
elements, inds_in_a = b[inds_in_b], inds[bina[inds_in_b]]
答案 1 :(得分:3)
对于排序数组a
,这是另一种方法np.searchsorted
使用其可选参数 - side
设置为left
和right
-
lidx = np.searchsorted(a,b,'left')
ridx = np.searchsorted(a,b,'right')
mask = lidx != ridx
out = lidx[mask], np.flatnonzero(mask)
# for zipped o/p : zip(lidx[mask], np.flatnonzero(mask))
运行时测试
方法 -
def searchsorted_where(a,b): # @Paul Panzer's soln
inds = np.argsort(a)
aso = a[inds]
bina = np.searchsorted(aso[:-1], b)
inds_in_b = np.where(b == aso[bina])[0]
return b[inds_in_b], inds_in_b
def in1d_masking(a,b): # @Psidom's soln
logic = np.in1d(b, a)
return b[logic], np.where(logic)[0]
def searchsorted_twice(a,b): # Proposed in this post
lidx = np.searchsorted(a,b,'left')
ridx = np.searchsorted(a,b,'right')
mask = lidx != ridx
return lidx[mask], np.flatnonzero(mask)
计时 -
案例#1(使用来自问题的样本数据并进行扩展):
In [2]: a=np.arange(0.,15000.)
...: b=np.arange(4.,15000.,0.5)
...:
In [3]: %timeit searchsorted_where(a,b)
...: %timeit in1d_masking(a,b)
...: %timeit searchsorted_twice(a,b)
...:
1000 loops, best of 3: 721 µs per loop
1000 loops, best of 3: 1.76 ms per loop
1000 loops, best of 3: 1.28 ms per loop
案例#2(与案例#1相同,b
中的元素数量比a
中的元素数量少):
In [4]: a=np.arange(0.,15000.)
...: b=np.arange(4.,15000.,5)
...:
In [5]: %timeit searchsorted_where(a,b)
...: %timeit in1d_masking(a,b)
...: %timeit searchsorted_twice(a,b)
...:
10000 loops, best of 3: 77.4 µs per loop
1000 loops, best of 3: 428 µs per loop
10000 loops, best of 3: 128 µs per loop
案例#3(b
中相对较小的元素):
In [6]: a=np.arange(0.,15000.)
...: b=np.arange(4.,15000.,10)
...:
In [7]: %timeit searchsorted_where(a,b)
...: %timeit in1d_masking(a,b)
...: %timeit searchsorted_twice(a,b)
...:
10000 loops, best of 3: 42.8 µs per loop
1000 loops, best of 3: 392 µs per loop
10000 loops, best of 3: 71.9 µs per loop
答案 2 :(得分:1)
您可以使用numpy.in1d
在b
中查找a
的元素,逻辑索引和numpy.where
可以相应地获取元素和索引:
logic = np.in1d(b, a)
list(zip(b[logic], np.where(logic)[0]))
# [(4.0, 0), (5.0, 2), (6.0, 4), (7.0, 6)]
b[logic], np.where(logic)[0]
# (array([ 4., 5., 6., 7.]), array([0, 2, 4, 6]))