给定numpy
和nx3
的两个mx3
数组,确定行索引(计数器)的有效方法是什么,其中行在两个数组中是通用的。例如,我有以下解决方案,对于甚至更大的数组而言,这个解决方案显然很慢
def arrangment(arr1,arr2):
hits = []
for i in range(arr2.shape[0]):
current_row = np.repeat(arr2[i,:][None,:],arr1.shape[0],axis=0)
x = current_row - arr1
for j in range(arr1.shape[0]):
if np.isclose(x[j,0],0.0) and np.isclose(x[j,1],0.0) and np.isclose(x[j,2],0.0):
hits.append(j)
return hits
它检查arr2
中是否存在arr1
行,并返回行匹配的arr1
行索引。我需要这种安排总是按arr2
的行顺序递增。比如给出
arr1 = np.array([[-1., -1., -1.],
[ 1., -1., -1.],
[ 1., 1., -1.],
[-1., 1., -1.],
[-1., -1., 1.],
[ 1., -1., 1.],
[ 1., 1., 1.],
[-1., 1., 1.]])
arr2 = np.array([[-1., 1., -1.],
[ 1., 1., -1.],
[ 1., 1., 1.],
[-1., 1., 1.]])
该函数应该返回:
[3, 2, 6, 7]
答案 0 :(得分:3)
快速而肮脏的回答
(arr1[:, None] == arr2).all(-1).argmax(0)
array([3, 2, 6, 7])
更好的答案
考虑到arr2
中的一行与arr1
t = (arr1[:, None] == arr2).all(-1)
np.where(t.any(0), t.argmax(0), np.nan)
array([ 3., 2., 6., 7.])
正如@Divakar np.isclose
所指出的那样,在比较花车时会出现舍入错误
t = np.isclose(arr1[:, None], arr2).all(-1)
np.where(t.any(0), t.argmax(0), np.nan)
答案 1 :(得分:0)
我有一个类似的problem in the past,我想出了一个相当优化的解决方案。
首先,你需要对多维数组进行numpy.unique
的推广,为了完整起见,我会copy在这里
def unique2d(arr,consider_sort=False,return_index=False,return_inverse=False):
"""Get unique values along an axis for 2D arrays.
input:
arr:
2D array
consider_sort:
Does permutation of the values within the axis matter?
Two rows can contain the same values but with
different arrangements. If consider_sort
is True then those rows would be considered equal
return_index:
Similar to numpy unique
return_inverse:
Similar to numpy unique
returns:
2D array of unique rows
If return_index is True also returns indices
If return_inverse is True also returns the inverse array
"""
if consider_sort is True:
a = np.sort(arr,axis=1)
else:
a = arr
b = np.ascontiguousarray(a).view(np.dtype((np.void,
a.dtype.itemsize * a.shape[1])))
if return_inverse is False:
_, idx = np.unique(b, return_index=True)
else:
_, idx, inv = np.unique(b, return_index=True, return_inverse=True)
if return_index == False and return_inverse == False:
return arr[idx]
elif return_index == True and return_inverse == False:
return arr[idx], idx
elif return_index == False and return_inverse == True:
return arr[idx], inv
else:
return arr[idx], idx, inv
现在您只需要连接(np.vstack
)数组并找到唯一的行。反向映射与np.searchsorted
一起将为您提供所需的索引。因此,我们编写另一个类似于numpy.in2d
的函数,但是对于多维(2D)数组
def in2d_unsorted(arr1, arr2, axis=1, consider_sort=False):
"""Find the elements in arr1 which are also in
arr2 and sort them as the appear in arr2"""
assert arr1.dtype == arr2.dtype
if axis == 0:
arr1 = np.copy(arr1.T,order='C')
arr2 = np.copy(arr2.T,order='C')
if consider_sort is True:
sorter_arr1 = np.argsort(arr1)
arr1 = arr1[np.arange(arr1.shape[0])[:,None],sorter_arr1]
sorter_arr2 = np.argsort(arr2)
arr2 = arr2[np.arange(arr2.shape[0])[:,None],sorter_arr2]
arr = np.vstack((arr1,arr2))
_, inv = unique2d(arr, return_inverse=True)
size1 = arr1.shape[0]
size2 = arr2.shape[0]
arr3 = inv[:size1]
arr4 = inv[-size2:]
# Sort the indices as they appear in arr2
sorter = np.argsort(arr3)
idx = sorter[arr3.searchsorted(arr4, sorter=sorter)]
return idx
现在您需要做的就是使用输入参数调用in2d_unsorted
>>> in2d_unsorted(arr1,arr2)
array([ 3, 2, 6, 7])
虽然可能没有完全优化,但这种方法要快得多。让我们针对@piRSquared
解决方案进行基准测试
def indices_piR(arr1,arr2):
t = np.isclose(arr1[:, None], arr2).all(-1)
return np.where(t.any(0), t.argmax(0), np.nan)
使用以下数组
n=150
arr1 = np.random.permutation(n).reshape(n//3, 3)
idx = np.random.permutation(n//3)
arr2 = arr1[idx]
In [13]: np.allclose(in2d_unsorted(arr1,arr2),indices_piR(arr1,arr2))
True
In [14]: %timeit indices_piR(arr1,arr2)
10000 loops, best of 3: 181 µs per loop
In [15]: %timeit in2d_unsorted(arr1,arr2)
10000 loops, best of 3: 85.7 µs per loop
现在,n=1500
In [24]: %timeit indices_piR(arr1,arr2)
100 loops, best of 3: 10.3 ms per loop
In [25]: %timeit in2d_unsorted(arr1,arr2)
1000 loops, best of 3: 403 µs per loop
和n=15000
In [28]: %timeit indices_piR(A,B)
1 loop, best of 3: 1.02 s per loop
In [29]: %timeit in2d_unsorted(arr1,arr2)
100 loops, best of 3: 4.65 ms per loop
因此,对于较大的ish
数组,与@piRSquared
的矢量化解决方案相比,它的速度超过 200X 。