类似的问题已被提出,但没有一个答案完全符合我的要求 - 有些允许多维搜索(也就是matlab中的'行'选项)但不返回索引。有些返回索引但不允许行。我的阵列非常大(1M x 2)并且我已经成功地制作了一个可行的循环,但显然这非常慢。在matlab中,内置的ismember函数大约需要10秒。
以下是我要找的内容:
a=np.array([[4, 6],[2, 6],[5, 2]])
b=np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
实现这一诀窍的确切matlab函数是:
[~,index] = ismember(a,b,'rows')
其中
index = [6, 3, 9]
答案 0 :(得分:1)
import numpy as np
def asvoid(arr):
"""
View the array as dtype np.void (bytes)
This views the last axis of ND-arrays as bytes so you can perform comparisons on
the entire row.
http://stackoverflow.com/a/16840350/190597 (Jaime, 2013-05)
Warning: When using asvoid for comparison, note that float zeros may compare UNEQUALLY
>>> asvoid([-0.]) == asvoid([0.])
array([False], dtype=bool)
"""
arr = np.ascontiguousarray(arr)
return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
def in1d_index(a, b):
voida, voidb = map(asvoid, (a, b))
return np.where(np.in1d(voidb, voida))[0]
a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
print(in1d_index(a, b))
打印
[2 5 8]
这相当于Matlab的[3,6,9],因为Python使用基于0的索引。
一些警告:
a
中b
项目的位置。asvoid([-0.]) == asvoid([0.])
返回
array([False])
。尽管存在警告,但为了速度,人们可能会选择使用in1d_index
:
def ismember_rows(a, b):
# http://stackoverflow.com/a/22705773/190597 (ashg)
return np.nonzero(np.all(b == a[:,np.newaxis], axis=2))[1]
In [41]: a2 = np.tile(a,(2000,1))
In [42]: b2 = np.tile(b,(2000,1))
In [46]: %timeit in1d_index(a2, b2)
100 loops, best of 3: 8.49 ms per loop
In [47]: %timeit ismember_rows(a2, b2)
1 loops, best of 3: 5.55 s per loop
所以in1d_index
快〜650倍(对于数千的长数组),但请注意,由于in1d_index
按递增顺序返回索引,因此比较并不完全是苹果对苹果。而ismember_rows
返回a
中b
的订单行中的索引。
答案 1 :(得分:0)
import numpy as np
def ismember_rows(a, b):
'''Equivalent of 'ismember' from Matlab
a.shape = (nRows_a, nCol)
b.shape = (nRows_b, nCol)
return the idx where b[idx] == a
'''
return np.nonzero(np.all(b == a[:,np.newaxis], axis=2))[1]
a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
idx = ismember_rows(a, b)
print idx
print np.all(b[idx] == a)
打印
array([5, 2, 8])
True
e ...我用广播
-------------------------- [更新] ------------------ ------------
def ismember(a, b):
return np.flatnonzero(np.in1d(b[:,0], a[:,0]) & np.in1d(b[:,1], a[:,1]))
a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
a2 = np.tile(a,(2000,1))
b2 = np.tile(b,(2000,1))
%timeit timeit in1d_index(a2, b2)
# 100 loops, best of 3: 8.74 ms per loop
%timeit ismember(a2, b2)
# 100 loops, best of 3: 8.5 ms per loop
np.all(in1d_index(a2, b2) == ismember(a2, b2))
# True
正如unutbu所说,指数按递增顺序返回
答案 2 :(得分:0)
该函数首先将多列元素转换为单个列数组,然后可以使用numpy.in1d找出所需的答案,请尝试以下代码:
import numpy as np
def ismemberRow(A,B):
'''
This function is find which rows found in A can be also found in B,
The function first turns multiple columns of elements into a single column array, then numpy.in1d can be used
Input: m x n numpy array (A), and p x q array (B)
Output unique numpy array with length m, storing either True or False, True for rows can be found in both A and B
'''
sa = np.chararray((A.shape[0],1))
sa[:] = '-'
sb = np.chararray((B.shape[0],1))
sb[:] = '-'
ba = (A).astype(np.str)
sa2 = np.expand_dims(ba[:,0],axis=1) + sa + np.expand_dims(ba[:,1],axis=1)
na = A.shape[1] - 2
for i in range(0,na):
sa2 = sa2 + sa + np.expand_dims(ba[:,i+2],axis=1)
bb = (B).astype(np.str)
sb2 = np.expand_dims(bb[:,0],axis=1) + sb + np.expand_dims(bb[:,1],axis=1)
nb = B.shape[1] - 2
for i in range(0,nb):
sb2 = sb2 + sb + np.expand_dims(bb[:,i+2],axis=1)
return np.in1d(sa2,sb2)
A = np.array([[1, 3, 4],[2, 4, 3],[7, 4, 3],[1, 1, 1],[1, 3, 4],[5, 3, 4],[1, 1, 1],[2, 4, 3]])
B = np.array([[1, 3, 4],[1, 1, 1]])
d = ismemberRow(A,B)
print A[np.where(d)[0],:]
#results:
#[[1 3 4]
# [1 1 1]
# [1 3 4]
# [1 1 1]]