Question

我有左上角的坐标和矩形列表的右下角说（a，b）和（c，d）。我想检测并删除矩形内的矩形。重叠的矩形可以留下来。

我有一个10,000个矩形的数据集，我想要一种有效的方法来解决这个问题。

目前我这样做，

import pandas

data = pd.read_csv('dataset.csv')

l = list(range(len(data)-1))

for i in range(len(data)):
    length = len(l)
    if i >= length:
        break
    for j in range(i+1, length):
        if j >= len(l):
           break
        if (data.iloc[l[i]]['a'] >= data.iloc[l[j]]['a']) and (data.iloc[l[i]]['b'] <= data.iloc[l[j]]['b']) and (data.iloc[l[i]]['c'] <= data.iloc[l[j]]['c']) and (data.iloc[l[i]]['d'] >= data.iloc[l[j]]['d']):
           l.pop(j)

我已经按照矩形区域的降序对数据集进行排序后实现了这个算法，因为矩形区域较大，不适合具有较小区域的矩形。在这里，我检测到它是否在另一个矩形内，从列表l中弹出矩形的索引。每次元素加速时，它都会减少迭代次数。

这需要几个小时才能解决，我需要一种有效的方法来解决它甚至数十万个样本。

请帮忙！

Answer 1

这是一个你可以尝试的分而治之的算法。

我认为只要你能迅速列举每一对碰撞矩形，你也可以检查一个是否完全在另一个时间内包含在另一个中。

所以，我们只能找到碰撞的矩形。

首先，将其概括如下：假设您有两组矩形A和B，并且您只想找到对 (a, b)这样的矩形a来自A，b来自B，和a和b相交。

首先，这个想法。请考虑以下示例部分中有两组A和B的矩形由水平线L分隔：

      +----+                    +-----+
      | A1 |                    |  B1 |
      |    |     +-----+        +-----+
      +----+     |  A2 |
                 +-----+     +-----+
                             |  A3 |
_____________________________|_____|________ L
                             |     |
         +-------------------+##+  |
         |                   |##|  |
         |     B2            +##|--+    
         |                      |
         +----------------------+

行L将集A和B细分为三个子集：

A above L: {A1, A2}         (short: A>L)
A intersects L: {A3}        (short: A=L)
A below L: {}               (short: A<L)


B above L: {B1}             (B>L)
B intersects L: {}          (B=L)
B below L: {B2}             (B<L)

观察到以下组中的矩形只能碰撞：

         A<L    A=L    A>L
B<L       y      y      N
B=L       y      y      y
B>L       N      y      y

也就是说，如果我们想要查找A和B之间的所有碰撞，找到了合适的线L，我们可以忽略它们之间的碰撞 A<L与B>L和A>L与B<L的对比。因此，我们获得以下分而治之算法：虽然A和B不为空，但找到一条合适的线（大致）最大化已消除的碰撞检查次数，细分A并且B分为三组，递归地进行七次子组碰撞，忽略两个子组组合。

假设如果矩形是＆＃34;小＆＃34;，并且组A=L和B=L大多是空的，这将（大致）将组的大小减少一半每一步，我们都会获得一个平均运行的算法，例如O(n*log(n))而不是O(n*n)。

获得任意A和B的一般情况后，取整套矩形R并使用A = R; B = R运行算法。

这是Python中的粗略草图：

def isSubinterval(aStart, aEnd, bStart, bEnd):
  return aStart >= bStart and aEnd <= bEnd

def intersects(aStart, aEnd, bStart, bEnd):
  return not (aEnd < bStart or aStart > bEnd)

class Rectangle:
  def __init__(self, l, r, b, t):
    self.left = l
    self.right = r
    self.bottom = b
    self.top = t

  def isSubrectangle(self, other):
    return (
      isSubinterval(self.left, self.right, other.left, other.right) and
      isSubinterval(self.bottom, self.top, other.bottom, other.top)
    )

  def intersects(self, other):
    return (
      intersects(self.left, self.right, other.left, other.right) and
      intersects(self.bottom, self.top, other.bottom, other.top)
    )

  def __repr__(self):
    return ("[%f,%f]x[%f,%f]" % (self.left, self.right, self.bottom, self.top))

def boundingBox(rects):
  infty = float('inf')
  b = infty
  t = - infty
  l = infty
  r = - infty
  for rect in rects:
    b = min(b, rect.bottom)
    l = min(l, rect.left)
    r = max(r, rect.right)
    t = max(t, rect.top)
  return Rectangle(l, r, b, t)

class DividingLine:
  def __init__(self, isHorizontal, position):
    self.isHorizontal = isHorizontal
    self.position = position

  def isAbove(self, rectangle):
    if self.isHorizontal:
      return rectangle.bottom > self.position
    else:
      return rectangle.left > self.position

  def isBelow(self, rectangle):
    if self.isHorizontal:
      return rectangle.top < self.position
    else:
      return rectangle.right < self.position

def enumeratePossibleLines(boundingBox):
  NUM_TRIED_LINES = 5
  for i in range(1, NUM_TRIED_LINES + 1):
    w = boundingBox.right - boundingBox.left
    yield DividingLine(False, boundingBox.left + w / float(NUM_TRIED_LINES + 1) * i)
    h = boundingBox.top - boundingBox.bottom
    yield DividingLine(True, boundingBox.bottom + h / float(NUM_TRIED_LINES + 1) * i)

def findGoodDividingLine(rects_1, rects_2):
  bb = boundingBox(rects_1 + rects_2)
  bestLine = None
  bestGain = 0
  for line in enumeratePossibleLines(bb):
    above_1 = len([r for r in rects_1 if line.isAbove(r)])
    below_1 = len([r for r in rects_1 if line.isBelow(r)])
    above_2 = len([r for r in rects_2 if line.isAbove(r)])
    below_2 = len([r for r in rects_2 if line.isBelow(r)])

    # These groups are separated by the line, no need to 
    # perform all-vs-all collision checks on those groups!
    gain = above_1 * below_2 + above_2 * below_1
    if gain > bestGain:
      bestGain = gain
      bestLine = line
  return bestLine

# Collides all rectangles from list `rects_1` with 
# all rectangles from list `rects_2`, and invokes
# `onCollision(a, b)` on every colliding `a` and `b`.
def collideAllVsAll(rects_1, rects_2, onCollision):
  if rects_1 and rects_2: # if one list empty, no collisions
    line = findGoodDividingLine(rects_1, rects_2)
    if line:
      above_1 = [r for r in rects_1 if line.isAbove(r)]
      below_1 = [r for r in rects_1 if line.isBelow(r)]
      above_2 = [r for r in rects_2 if line.isAbove(r)]
      below_2 = [r for r in rects_2 if line.isBelow(r)]
      intersect_1 = [r for r in rects_1 if not (line.isAbove(r) or line.isBelow(r))]
      intersect_2 = [r for r in rects_2 if not (line.isAbove(r) or line.isBelow(r))]
      collideAllVsAll(above_1, above_2, onCollision)
      collideAllVsAll(above_1, intersect_2, onCollision)
      collideAllVsAll(intersect_1, above_2, onCollision)
      collideAllVsAll(intersect_1, intersect_2, onCollision)
      collideAllVsAll(intersect_1, below_2, onCollision)
      collideAllVsAll(below_1, intersect_2, onCollision)
      collideAllVsAll(below_1, below_2, onCollision)
    else:
      for r1 in rects_1:
        for r2 in rects_2:
          if r1.intersects(r2):
            onCollision(r1, r2)

这是一个小小的演示：

rects = [
  Rectangle(1,6,9,10),
  Rectangle(4,7,6,10),
  Rectangle(1,5,6,7),
  Rectangle(8,9,8,10),
  Rectangle(6,9,5,7),
  Rectangle(8,9,1,6),
  Rectangle(7,9,2,4),
  Rectangle(2,8,2,3),
  Rectangle(1,3,1,4)
]

def showInterestingCollision(a, b):
  if a is not b:
    if a.left < b.left:
      print("%r <-> %r collision" % (a, b))

collideAllVsAll(rects, rects, showInterestingCollision)

至少在这种情况下，它确实检测到所有有趣的碰撞：

[1.000000,6.000000]x[9.000000,10.000000] <-> [4.000000,7.000000]x[6.000000,10.000000] collision
[1.000000,5.000000]x[6.000000,7.000000] <-> [4.000000,7.000000]x[6.000000,10.000000] collision
[4.000000,7.000000]x[6.000000,10.000000] <-> [6.000000,9.000000]x[5.000000,7.000000] collision
[6.000000,9.000000]x[5.000000,7.000000] <-> [8.000000,9.000000]x[1.000000,6.000000] collision
[7.000000,9.000000]x[2.000000,4.000000] <-> [8.000000,9.000000]x[1.000000,6.000000] collision
[2.000000,8.000000]x[2.000000,3.000000] <-> [8.000000,9.000000]x[1.000000,6.000000] collision
[2.000000,8.000000]x[2.000000,3.000000] <-> [7.000000,9.000000]x[2.000000,4.000000] collision
[1.000000,3.000000]x[1.000000,4.000000] <-> [2.000000,8.000000]x[2.000000,3.000000] collision

这是一个更现实的演示：

from random import random
from matplotlib import pyplot as plt

def randomRect():
  w = random() * 0.1
  h = random() * 0.1
  centerX = random() * (1 - w)
  centerY = random() * (1 - h)
  return Rectangle(
    centerX - w/2, centerX + w/2,
    centerY - h/2, centerY + h/2
  )

randomRects = [randomRect() for _ in range(0, 500)]

for r in randomRects:
  plt.fill(
    [r.left, r.right, r.right, r.left], 
    [r.bottom, r.bottom, r.top, r.top],
    'b-',
    color = 'k',
    fill = False
  )

def markSubrectanglesRed(a, b):
  if a is not b:
    if a.isSubrectangle(b):
      plt.fill(
        [a.left, a.right, a.right, a.left], 
        [a.bottom, a.bottom, a.top, a.top],
        'b-',
        color = 'r',
        alpha = 0.4
      )
      plt.fill(
        [b.left, b.right, b.right, b.left], 
        [b.bottom, b.bottom, b.top, b.top],
        'b-',
        color = 'b',
        fill = False
      )

collideAllVsAll(randomRects, randomRects, markSubrectanglesRed)

plt.show()

该图显示所有已消除的红色矩形，以及蓝色的封闭矩形：

这是一个可视化的边界框（黄色）和准二元空间分割的选定分割线（青色），用于一个小碰撞的小例子：

10000＆＃34;合理大小＆＃34;随机矩形（与图像中的交叉率大致相同），它会在18秒内计算所有碰撞，即使代码距离优化还很远。

Answer 2

您的问题是空间接近问题，因此我建议您考虑在空间上索引数据。这就是以这样的方式存储或索引矩形，即查询空间关系很便宜。有关最常见的数据结构，请参阅wikipedia。

我使用R树实现了一个演示。整个＆＃34;算法＆＃34;由以下功能组成。它不是特别优雅，因为每次独特的碰撞被调查两次。这主要是由于使用的rtree库提供的访问和查询接口有限。

import rtree  
def findCollisions(rects, onCollision):
    idx = rtree.index.Index(interleaved=False)
    for rect in rects:
        idx.insert(rect.id, rect.coords)

    for rect in rects:
        ids = idx.intersection(rect.coords)
        for hit in [randomRects[j] for j in ids]:
            onCollision(rect, hit)

我从@AndreyTyukin无耻地复制了周围的基础设施，只做了一些修改：

from random import random

def isSubinterval(aStart, aEnd, bStart, bEnd):
  return aStart >= bStart and aEnd <= bEnd

def intersects(aStart, aEnd, bStart, bEnd):
  return not (aEnd < bStart or aStart > bEnd)

class Rectangle:
  id = 0
  def __init__(self, l, r, b, t):
    self.left = l
    self.right = r
    self.bottom = b
    self.top = t
    self.id = Rectangle.id
    Rectangle.id += 1

  @property  
  def coords(self):
      return (self.left, self.right, self.bottom, self.top)

  def isSubrectangle(self, other):
    return (
      isSubinterval(self.left, self.right, other.left, other.right) and
      isSubinterval(self.bottom, self.top, other.bottom, other.top)
    )

  def intersects(self, other):
    return (
      intersects(self.left, self.right, other.left, other.right) and
      intersects(self.bottom, self.top, other.bottom, other.top)
    )

  def __repr__(self):
    return ("[%f,%f]x[%f,%f]" % (self.left, self.right, self.bottom, self.top))


def randomRect(ratio=0.1, scale=100):
  w = random() * ratio
  h = random() * ratio
  centerX = random() * (1 - w)
  centerY = random() * (1 - h)
  return Rectangle(
    scale*(centerX - w/2), scale*(centerX + w/2),
    scale*(centerY - h/2), scale*(centerY + h/2),
  )

与@ Andrey的解决方案的比较产生了大约一个数量级的改进。这可能主要是因为python rtree使用了底层的C实现。

Answer 3

如果矩形分布相当均匀，您可以将其视为一维问题来节省时间，首先集中在X（或Y）轴上。

每个矩形都有一个最小和最大的X坐标，左上角和右下角的X坐标。为每个矩形创建两条记录，给出其最小或最大X坐标和一个指向矩形的指针。将这些记录按X顺序递增的顺序排序，然后按顺序进行处理。

维护一个由最小X坐标排序的矩形数组，当你看到它的最小X坐标时将一条记录插入其中，当你看到它的最大X坐标时从中删除一条记录。在删除记录之前，您可以在数组中进行二进制搜索，以查找最小X坐标不超过您要删除的记录的最小X坐标及其最大X坐标的所有记录至少是您要删除的记录的。检查这些以查看它们的Y坐标是否还包含您要删除的记录。如果是这样，您已找到一条完全包含您要删除的记录的记录。这应该找到所有的X-containments，因为该数组包含每个矩形的记录，这些矩形与X维度中的当前X点重叠 - 它们已被插入但尚未删除。

（事实上，如果X坐标有联系，你需要比这更谨慎。）

如何检测矩形中的矩形？

3 个答案: