我有一个计算机视觉算法,可以在检测到的对象周围放置边界框。边界框列表如下:
bounding_boxes = [[x, y, w, h], [x2, y2, w2, h2], ...]
其中x和y是左上角的坐标,h和w是框的高度和宽度。但是,我对完全包含在任何其他更大的盒子中的盒子不感兴趣。什么是检测这些的有效方法?
答案 0 :(得分:1)
正如您在问题评论中所确认的那样,您需要识别并删除单个其他框中包含的框。如果某个框包含在其他框的联合中,但没有其他单个框包含它,则不应将其删除(例如,在img = cv2.imread(img.image.url)
的情况下,第二个框包含在第一个和第三个的联合,但不应该删除。)
这项任务的天真(暴力)算法非常简单。这是伪代码:
boxes = [[0, 0, 2, 4], [1, 1, 3, 3], [2, 0, 4, 4]]
此算法的复杂性显然是for i in [0, 1, ..., n]:
for j in [i+1, i+2, ..., n]:
check if box[i] contains in box[j] and otherwise.
。这个算法很容易实现,如果盒子的数量很小(大约100-500,如果你不需要实时的视频处理性能,甚至1000),建议使用这个算法。
快速算法的复杂性为O(n^2)
,我认为这也是此问题的最小理论复杂度。形式上,所需算法采用以下输入并返回以下输出:
O(n log n)
快速算法的伪代码:
Input: boxes[] - Array of n Rectangles, Tuples of (x1, y1, x2, y2), where
(x1, y1) is coordinates of the left bottom corner, (x2, y2)
is the coordinates of the top right corner.
Output: inner_boxes[] - Array of Rectangles that should be removed.
现在,棘手的部分是此算法的步骤1) Allocate an Array events[] with the length 2*n, the elements of which are
Tuples (y, corresponding_box_index, event).
2) For i in [0, 1, ..., n]:
events[2 * i ] = Tuple(boxes[i].y1, i, 'push')
events[2 * i + 1] = Tuple(boxes[i].y2, i, 'pop')
3) Sort events[] by the ascending of y coordinate (from smaller to larger).
If there are equal y coordinates, Then:
- Tuples with 'pop' event are smaller thant Tuples with 'push' event.
- If two Tuples has the same event, they are sorted by the ascending of
the width of their corresponding boxes.
4) Create a Map cross_section_map[], that maps a Key (Value) x to a Tuple
(corresponding_box_index, type), where type can be either 'left' or 'right'.
Make sure that the 'insert' and 'erase' operation of this data structure
has the complexity O(log n), it is iterable, the elements are iterated in
an key-ascending manner, and you can search for a key in O(log n) time.
5) For step in [0, 1, ..., 2*n]:
If events[step].event is 'push':
- Let i = events[step].corresponding_box_index
- Insert a map boxes[i].x1 -> (i, 'left') to cross_section_map[]
- Insert a map boxes[i].x2 -> (i, 'right') to cross_section_map[]
- Search for a 'right'-typed key with x value no less than boxes[i].x2
- Iterate from that key until you found a key, which corresponds to
a box that contains boxes[i], or the x1 coordinate of which is larger
than the x1 coordinate of a newly added box. In the first case, add
boxes[i] to inner_boxes[].
If events[step].event is 'pop':
- Let i = events[step].corresponding_box_index
- Erase the elements with the keys boxes[i].x1 and boxes[i].x2
。实现这样的数据结构似乎很难。但是,在C ++标准库中有一个非常棒的实现,称为std::map
。适用于(4)
的搜索操作是std::map::lower_bound
和std::map::upper_bound
。
此算法的平均复杂度为O(log n)
,最差情况下的复杂度为O(n log n)
,如果框的数量及其大小与图像大小相比相对较小,则复杂性接近于O(n^2)
。