根据Pascal VOC的挑战,有以下几点:


如果预测的边界框重叠更多,则认为它是正确的   超过50%的地面真实边界框,否则边界框   被认为是假阳性检测。多次检测   处罚。如果系统预测了几个重叠的边界框   使用单个地面实况边界框,只有一个预测   被认为是正确的,其他被认为是误报。


我已经搜索了但我还没有找到一个标准的算法 - 这是令人惊讶的,因为我会认为这是计算机视觉中非常常见的事情。 (我是新手)。我错过了吗?有谁知道这类问题的标准算法是什么?

def get_iou(bb1, bb2):
    Calculate the Intersection over Union (IoU) of two bounding boxes.

    bb1 : dict
        Keys: {'x1', 'x2', 'y1', 'y2'}
        The (x1, y1) position is at the top left corner,
        the (x2, y2) position is at the bottom right corner
    bb2 : dict
        Keys: {'x1', 'x2', 'y1', 'y2'}
        The (x, y) position is at the top left corner,
        the (x2, y2) position is at the bottom right corner

        in [0, 1]
    assert bb1['x1'] < bb1['x2']
    assert bb1['y1'] < bb1['y2']
    assert bb2['x1'] < bb2['x2']
    assert bb2['y1'] < bb2['y2']

    # determine the coordinates of the intersection rectangle
    x_left = max(bb1['x1'], bb2['x1'])
    y_top = max(bb1['y1'], bb2['y1'])
    x_right = min(bb1['x2'], bb2['x2'])
    y_bottom = min(bb1['y2'], bb2['y2'])

    if x_right < x_left or y_bottom < y_top:
        return 0.0

    # The intersection of two axis-aligned bounding boxes is always an
    # axis-aligned bounding box
    intersection_area = (x_right - x_left) * (y_bottom - y_top)

    # compute the area of both AABBs
    bb1_area = (bb1['x2'] - bb1['x1']) * (bb1['y2'] - bb1['y1'])
    bb2_area = (bb2['x2'] - bb2['x1']) * (bb2['y2'] - bb2['y1'])

    # compute the intersection over union by taking the intersection
    # area and dividing it by the sum of prediction + ground-truth
    # areas - the interesection area
    iou = intersection_area / float(bb1_area + bb2_area - intersection_area)
    assert iou >= 0.0
    assert iou <= 1.0
    return iou


图片来自this answer

如果您使用屏幕(像素)坐标,则top-voted answer会出现数学错误!几周前,我提交了an edit,为所有读者提供了详尽的解释,以便他们理解数学。但是审阅者无法理解该编辑内容,因此删除了该编辑内容,因此我再次提交了相同的编辑内容,但这次进行了更简短的总结。 (更新:Rejected 2vs1,因为它被认为是“重大更改”,呵呵)。


因此,是的,通常来说,票数最高的答案是正确的,并且是计算IoU的好方法。但是(正如其他人也指出的那样),它的数学运算对于计算机屏幕是完全不正确的。您不能只做(x2 - x1) * (y2 - y1),因为那样将不会产生正确的面积计算。屏幕索引从像素0,0开始,在width-1,height-1结束。屏幕坐标的范围是inclusive:inclusive(两端都包括在内),因此像素坐标中从010的范围实际上是11个像素宽,因为它包括0 1 2 3 4 5 6 7 8 9 10 (11个项目)。因此,要计算屏幕坐标的面积,您必须为每个尺寸添加+1,如下所示:(x2 - x1 + 1) * (y2 - y1 + 1)

如果您正在使用其他不包含范围的坐标系(例如inclusive:exclusive,其中010的意思是“元素0-9,但不是10) “),那么就不需要额外的数学运算了。但最有可能的是,您正在处理基于像素的边界框。好吧,屏幕坐标从0,0开始,然后从那里上升。

1920x1080屏幕从0(第一个像素)索引到1919(水平最后一个像素),并从0(第一个像素)索引到1079 (垂直方向最后一个像素)。




但是如果数学area = (x_right - x_left) * (y_bottom - y_top)错误,我们将得到:(1919 - 0) * (1079 - 0) = 1919 * 1079 = 2070601个像素!错了!

这就是为什么我们必须在每个计算中添加+1,这为我们提供了以下更正的数学运算:area = (x_right - x_left + 1) * (y_bottom - y_top + 1),给出了:(1919 - 0 + 1) * (1079 - 0 + 1) = 1920 * 1080 = {{1 }} 像素!这确实是正确的答案!


您可以使用 torchvision 进行如下计算。 bbox 以 [x1, y1, x2, y2] 的格式准备。

import torch
import torchvision.ops.boxes as bops

box1 = torch.tensor([[511, 41, 577, 76]], dtype=torch.float)
box2 = torch.tensor([[544, 59, 610, 94]], dtype=torch.float)
iou = bops.box_iou(box1, box2)
# tensor([[0.1382]])

intersection_area = (x_right - x_left + 1) * (y_bottom - y_top + 1)   

-2分: x1 = 1 x2 = 3 ,距离的确是 x2-x1 = 2
-2个像素的索引: i1 = 1 i2 = 3 ,从像素i1到i2的分段包含3个像素,即 l = i2-i1 + 1 < / em>

import numpy as np
from matplotlib import path, transforms

def clip_boxes(box0, box1):
    path_coords = np.array([[box0[0, 0], box0[0, 1]],
                            [box0[1, 0], box0[0, 1]],
                            [box0[1, 0], box0[1, 1]],
                            [box0[0, 0], box0[1, 1]]])

    poly = path.Path(np.vstack((path_coords[:, 0],
                                path_coords[:, 1])).T, closed=True)
    clip_rect = transforms.Bbox(box1)

    poly_clipped = poly.clip_to_bbox(clip_rect).to_polygons()[0]

    return np.array([np.min(poly_clipped, axis=0),
                     np.max(poly_clipped, axis=0)])

box0 = np.array([[0, 0], [1, 1]])
box1 = np.array([[0, 0], [0.5, 0.5]])

print clip_boxes(box0, box1)

surface = np.zeros([1024,1024])
surface[1:1+10, 1:1+10] += 1
surface[100:100+500, 100:100+100] += 1
unionArea = (surface==2).sum()

iou (图像未按比例绘制)

from shapely.geometry import Polygon

def calculate_iou(box_1, box_2):
    poly_1 = Polygon(box_1)
    poly_2 = Polygon(box_2)
    iou = poly_1.intersection(poly_2).area / poly_1.union(poly_2).area
    return iou

box_1 = [[511, 41], [577, 41], [577, 76], [511, 76]]
box_2 = [[544, 59], [610, 59], [610, 94], [544, 94]]

print(calculate_iou(box_1, box_2))


import numpy as np

def box_area(arr):
    # arr: np.array([[x1, y1, x2, y2]])
    width = arr[:, 2] - arr[:, 0]
    height = arr[:, 3] - arr[:, 1]
    return width * height

def _box_inter_union(arr1, arr2):
    # arr1 of [N, 4]
    # arr2 of [N, 4]
    area1 = box_area(arr1)
    area2 = box_area(arr2)

    # Intersection
    top_left = np.maximum(arr1[:, :2], arr2[:, :2]) # [[x, y]]
    bottom_right = np.minimum(arr1[:, 2:], arr2[:, 2:]) # [[x, y]]
    wh = bottom_right - top_left
    # clip: if boxes not overlap then make it zero
    intersection = wh[:, 0].clip(0) * wh[:, 1].clip(0)

    union = area1 + area2 - intersection
    return intersection, union

def box_iou(arr1, arr2):
    # arr1[N, 4]
    # arr2[N, 4]
    # N = number of bounding boxes
    assert(arr1[:, 2:] > arr[:, :2]).all()
    assert(arr2[:, 2:] > arr[:, :2]).all()
    inter, union = _box_inter_union(arr1, arr2)
    iou = inter / union
box1 = np.array([[10, 10, 80, 80]])
box2 = np.array([[20, 20, 100, 100]])
box_iou(box1, box2)
