矩形的重叠率

时间:2017-03-05 17:15:47

标签: python arrays performance numpy vectorization

我有一个包含坐标数组的数组,如下所示:

a = [[0,0,300,400],[1,1,15,59],[5,5,300,400]]

现在我想得到每个矩形与其他矩形的重叠率:

def bool_rect_intersect(A, B):
    return not (B[0]>A[2] or B[2]<A[0] or B[3]<A[1] or B[1]>A[3])


def get_overlap_ratio(A, B):
    in_ = bool_rect_intersect(A, B)
    if not in_:
        return 0
    else:
        left = max(A[0], B[0]);
        top = max(A[1], B[1]);
        right = min(A[2], B[2]);
        bottom = min(A[3], B[3]);
        intersection = [left, top, right, bottom];
        surface_intersection = (intersection[2]-intersection[0])*(intersection[3]-intersection[1]);
        surface_A = (A[2]- A[0])*(A[3]-A[1]) + 0.0;
        return surface_intersection / surface_A

现在我正在寻找计算2000+大小的重叠网格的最快方法。 如果我循环它,它需要超过一分钟。我试过np.vectorize,但我不认为这适用于多维数组

1 个答案:

答案 0 :(得分:1)

方法#1:这是一种矢量化方法 -

def pairwise_overlaps(a):
    r,c = np.triu_indices(a.shape[0],1)

    lt = np.maximum(a[r,:2], a[c,:2])
    tb = np.minimum(a[r,2:], a[c,2:])

    si_vectorized = (tb[:,0] - lt[:,0]) * (tb[:,1] - lt[:,1])
    slicedA_comps = ((a[:,2]- a[:,0])*(a[:,3]-a[:,1]) + 0.0)
    sA_vectorized = np.take(slicedA_comps, r)
    return si_vectorized/sA_vectorized

示例运行 -

In [48]: a
Out[48]: 
array([[  0,   0, 300, 400],
       [  1,   1,  15,  59],
       [  5,   5, 300, 400]])

In [49]: print get_overlap_ratio(a[0], a[1]) # Looping thru pairs
    ...: print get_overlap_ratio(a[0], a[2])
    ...: print get_overlap_ratio(a[1], a[2])
    ...: 
0.00676666666667
0.971041666667
0.665024630542

In [50]: pairwise_overlaps(a) # Proposed app to get all those in one-go
Out[50]: array([ 0.00676667,  0.97104167,  0.66502463])

方法#2:仔细检查后,我们会看到在上一种方法中,使用rc的索引将是性能杀手因为他们会复制。我们可以通过对同一列中的每个其他元素执行列中每个元素的计算来改进这一点,如下面的实现中所列 -

def pairwise_overlaps_v2(a):
    rl = np.minimum(a[:,2], a[:,None,2]) - np.maximum(a[:,0], a[:,None,0])
    bt = np.minimum(a[:,3], a[:,None,3]) - np.maximum(a[:,1], a[:,None,1])
    si_vectorized2D = rl*bt
    slicedA_comps = ((a[:,2]- a[:,0])*(a[:,3]-a[:,1]) + 0.0)  
    overlaps2D = si_vectorized2D/slicedA_comps[:,None]

    r = np.arange(a.shape[0])
    tril_mask = r[:,None] < r
    return overlaps2D[tril_mask]

运行时测试

In [238]: n = 1000

In [239]: a = np.hstack((np.random.randint(0,100,(n,2)), \  
                         np.random.randint(300,500,(n,2))))

In [240]: np.allclose(pairwise_overlaps(a), pairwise_overlaps_v2(a))
Out[240]: True

In [241]: %timeit pairwise_overlaps(a)
10 loops, best of 3: 35.2 ms per loop

In [242]: %timeit pairwise_overlaps_v2(a)
100 loops, best of 3: 16 ms per loop

让我们将原始方法添加为循环理解 -

In [244]: r,c = np.triu_indices(a.shape[0],1)

In [245]: %timeit [get_overlap_ratio(a[r[i]], a[c[i]]) for i in range(len(r))]
1 loops, best of 3: 2.85 s per loop

围绕 180x 加速,第二种方法优于原来的方法!