我有一个包含坐标数组的数组,如下所示:
a = [[0,0,300,400],[1,1,15,59],[5,5,300,400]]
现在我想得到每个矩形与其他矩形的重叠率:
def bool_rect_intersect(A, B):
return not (B[0]>A[2] or B[2]<A[0] or B[3]<A[1] or B[1]>A[3])
def get_overlap_ratio(A, B):
in_ = bool_rect_intersect(A, B)
if not in_:
return 0
else:
left = max(A[0], B[0]);
top = max(A[1], B[1]);
right = min(A[2], B[2]);
bottom = min(A[3], B[3]);
intersection = [left, top, right, bottom];
surface_intersection = (intersection[2]-intersection[0])*(intersection[3]-intersection[1]);
surface_A = (A[2]- A[0])*(A[3]-A[1]) + 0.0;
return surface_intersection / surface_A
现在我正在寻找计算2000+大小的重叠网格的最快方法。 如果我循环它,它需要超过一分钟。我试过np.vectorize,但我不认为这适用于多维数组
答案 0 :(得分:1)
方法#1:这是一种矢量化方法 -
def pairwise_overlaps(a):
r,c = np.triu_indices(a.shape[0],1)
lt = np.maximum(a[r,:2], a[c,:2])
tb = np.minimum(a[r,2:], a[c,2:])
si_vectorized = (tb[:,0] - lt[:,0]) * (tb[:,1] - lt[:,1])
slicedA_comps = ((a[:,2]- a[:,0])*(a[:,3]-a[:,1]) + 0.0)
sA_vectorized = np.take(slicedA_comps, r)
return si_vectorized/sA_vectorized
示例运行 -
In [48]: a
Out[48]:
array([[ 0, 0, 300, 400],
[ 1, 1, 15, 59],
[ 5, 5, 300, 400]])
In [49]: print get_overlap_ratio(a[0], a[1]) # Looping thru pairs
...: print get_overlap_ratio(a[0], a[2])
...: print get_overlap_ratio(a[1], a[2])
...:
0.00676666666667
0.971041666667
0.665024630542
In [50]: pairwise_overlaps(a) # Proposed app to get all those in one-go
Out[50]: array([ 0.00676667, 0.97104167, 0.66502463])
方法#2:仔细检查后,我们会看到在上一种方法中,使用r
和c
的索引将是性能杀手因为他们会复制。我们可以通过对同一列中的每个其他元素执行列中每个元素的计算来改进这一点,如下面的实现中所列 -
def pairwise_overlaps_v2(a):
rl = np.minimum(a[:,2], a[:,None,2]) - np.maximum(a[:,0], a[:,None,0])
bt = np.minimum(a[:,3], a[:,None,3]) - np.maximum(a[:,1], a[:,None,1])
si_vectorized2D = rl*bt
slicedA_comps = ((a[:,2]- a[:,0])*(a[:,3]-a[:,1]) + 0.0)
overlaps2D = si_vectorized2D/slicedA_comps[:,None]
r = np.arange(a.shape[0])
tril_mask = r[:,None] < r
return overlaps2D[tril_mask]
In [238]: n = 1000
In [239]: a = np.hstack((np.random.randint(0,100,(n,2)), \
np.random.randint(300,500,(n,2))))
In [240]: np.allclose(pairwise_overlaps(a), pairwise_overlaps_v2(a))
Out[240]: True
In [241]: %timeit pairwise_overlaps(a)
10 loops, best of 3: 35.2 ms per loop
In [242]: %timeit pairwise_overlaps_v2(a)
100 loops, best of 3: 16 ms per loop
让我们将原始方法添加为循环理解 -
In [244]: r,c = np.triu_indices(a.shape[0],1)
In [245]: %timeit [get_overlap_ratio(a[r[i]], a[c[i]]) for i in range(len(r))]
1 loops, best of 3: 2.85 s per loop
围绕 180x
加速,第二种方法优于原来的方法!