python在一个numpy数组中的元素位置,在另一个数组中具有相等元素的位置

时间:2017-02-24 15:19:31

标签: python arrays list numpy

我不仅需要值,还需要一个numpy数组中元素的位置,这些元素也出现在第二个numpy数组中,我也需要第二个数组中的位置。

以下是我能够做到的最好的例子:

>>> a=np.arange(0.,15.)
>>> a
array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,
        11.,  12.,  13.,  14.])
>>> b=np.arange(4.,8.,.5)
>>> b
array([ 4. ,  4.5,  5. ,  5.5,  6. ,  6.5,  7. ,  7.5])
>>> [ (i,j) for (i,alem) in enumerate(a) for (j,blem) in enumerate(b) if alem==blem]
[(4, 0), (5, 2), (6, 4), (7, 6)]

任何人都有更快,更具特色或更“pythonic”的东西吗?

3 个答案:

答案 0 :(得分:3)

这是一个O((n + k)log(n + k))(朴素算法是O(nk))解np.unique

uniq, inv = np.unique(np.r_[a, b], return_inverse=True)
map = -np.ones((len(uniq),), dtype=int)
map[inv[:len(a)]] = np.arange(len(a))
bina = map[inv[len(a):]]
inds_in_b = np.where(bina != -1)[0]
elements, inds_in_a = b[inds_in_b], bina[inds_in_b]

或者您可以简单地将a排序为O((n + k)log(k))

inds = np.argsort(a)
aso = a[inds]
bina = np.searchsorted(aso[:-1], b)
inds_in_b = np.where(b == aso[bina])[0]
elements, inds_in_a = b[inds_in_b], inds[bina[inds_in_b]]

答案 1 :(得分:3)

对于排序数组a,这是另一种方法np.searchsorted使用其可选参数 - side设置为leftright -

lidx = np.searchsorted(a,b,'left')
ridx = np.searchsorted(a,b,'right')
mask = lidx != ridx
out = lidx[mask], np.flatnonzero(mask)
       # for zipped o/p : zip(lidx[mask], np.flatnonzero(mask))

运行时测试

方法 -

def searchsorted_where(a,b):  # @Paul Panzer's soln
    inds = np.argsort(a)
    aso = a[inds]
    bina = np.searchsorted(aso[:-1], b)
    inds_in_b = np.where(b == aso[bina])[0]
    return b[inds_in_b], inds_in_b

def in1d_masking(a,b):  # @Psidom's soln
    logic = np.in1d(b, a)    
    return b[logic], np.where(logic)[0]

def searchsorted_twice(a,b): # Proposed in this post
    lidx = np.searchsorted(a,b,'left')
    ridx = np.searchsorted(a,b,'right')
    mask = lidx != ridx
    return lidx[mask], np.flatnonzero(mask)

计时 -

案例#1(使用来自问题的样本数据并进行扩展):

In [2]: a=np.arange(0.,15000.)
   ...: b=np.arange(4.,15000.,0.5)
   ...: 

In [3]: %timeit searchsorted_where(a,b)
   ...: %timeit in1d_masking(a,b)
   ...: %timeit searchsorted_twice(a,b)
   ...: 
1000 loops, best of 3: 721 µs per loop
1000 loops, best of 3: 1.76 ms per loop
1000 loops, best of 3: 1.28 ms per loop

案例#2(与案例#1相同,b中的元素数量比a中的元素数量少):

In [4]: a=np.arange(0.,15000.)
   ...: b=np.arange(4.,15000.,5)
   ...: 

In [5]: %timeit searchsorted_where(a,b)
   ...: %timeit in1d_masking(a,b)
   ...: %timeit searchsorted_twice(a,b)
   ...: 
10000 loops, best of 3: 77.4 µs per loop
1000 loops, best of 3: 428 µs per loop
10000 loops, best of 3: 128 µs per loop

案例#3(b中相对较小的元素):

In [6]: a=np.arange(0.,15000.)
   ...: b=np.arange(4.,15000.,10)
   ...: 

In [7]: %timeit searchsorted_where(a,b)
   ...: %timeit in1d_masking(a,b)
   ...: %timeit searchsorted_twice(a,b)
   ...: 
10000 loops, best of 3: 42.8 µs per loop
1000 loops, best of 3: 392 µs per loop
10000 loops, best of 3: 71.9 µs per loop

答案 2 :(得分:1)

您可以使用numpy.in1db中查找a的元素,逻辑索引和numpy.where可以相应地获取元素和索引:

logic = np.in1d(b, a)    
list(zip(b[logic], np.where(logic)[0]))
# [(4.0, 0), (5.0, 2), (6.0, 4), (7.0, 6)]

b[logic], np.where(logic)[0]
# (array([ 4.,  5.,  6.,  7.]), array([0, 2, 4, 6]))