Question

上下文：我正在执行对象本地化，并希望实现“禁止返回”机制（即，在红色边界框位于trigger操作。）

问题：我不知道如何相对于原始输入（init_input）准确缩放边框（红色）。如果可以理解这种缩放比例，则应将黑十字准确地放置在红色边框的中间。

此功能的当前代码如下：

def IoR(b, init_input, prev_coord):
    """
    Inhibition-of-Return mechanism.

    Marks the region of the image covered by
    the bounding box with a black cross.

    :param b:
        The current bounding box represented as [x1, y1, x2, y2].

    :param init_input:
        The initial input volume of the current episode.

    :param prev_coord:
        The previous state's bounding box coordinates (x1, y1, x2, y2)
    """
    x1, y1, x2, y2 = prev_coord
    width = 12
    x_mid = (b[2] + b[0]) // 2
    y_mid = (b[3] + b[1]) // 2

    # Define vertical rectangle coordinates
    ver_x1 = int(((x_mid) * IMG_SIZE / (x2 - x1)) - width)
    ver_x2 = int(((x_mid) * IMG_SIZE / (x2 - x1)) + width)
    ver_y1 = int((b[1]) * IMG_SIZE / (y2 - y1))
    ver_y2 = int((b[3]) * IMG_SIZE / (y2 - y1))

    # Define horizontal rectangle coordinates
    hor_x1 = int((b[0]) * IMG_SIZE / (x2 - x1))
    hor_x2 = int((b[2]) * IMG_SIZE / (x2 - x1))
    hor_y1 = int(((y_mid) * IMG_SIZE / (y2 - y1)) - width)
    hor_y2 = int(((y_mid) * IMG_SIZE / (y2 - y1)) + width)

    # Draw vertical rectangle
    cv2.rectangle(init_input, (ver_x1, ver_y1), (ver_x2, ver_y2), (0, 0, 0), -1)

    # Draw horizontal rectangle
    cv2.rectangle(init_input, (hor_x1, hor_y1), (hor_x2, hor_y2), (0, 0, 0), -1)

所需的效果如下所示：

注意：我相信，此问题的复杂性是由于每次我执行某项操作（因此移至下一个状态）时都会将图像调整大小（至224、224、3）。因此，必须从以前的状态缩放中提取确定缩放的“锚”，如以下代码所示：

def next_state(init_input, b_prime, g):
    """
    Returns the observable region of the next state.

    Formats the next state's observable region, defined
    by b_prime, to be of dimension (224, 224, 3). Adding 16
    additional pixels of context around the original bounding box.
    The ground truth box must be reformatted according to the
    new observable region.

    IMG_SIZE = 224

    :param init_input:
        The initial input volume of the current episode.

    :param b_prime:
        The subsequent state's bounding box.

    :param g: (init_g)
        The initial ground truth box of the target object.
    """

    # Determine the pixel coordinates of the observable region for the following state
    context_pixels = 16
    x1 = max(b_prime[0] - context_pixels, 0)
    y1 = max(b_prime[1] - context_pixels, 0)
    x2 = min(b_prime[2] + context_pixels, IMG_SIZE)
    y2 = min(b_prime[3] + context_pixels, IMG_SIZE)

    # Determine observable region
    observable_region = cv2.resize(init_input[y1:y2, x1:x2], (224, 224), interpolation=cv2.INTER_AREA)

    # Resize ground truth box
    g[0] = int((g[0] - x1) * IMG_SIZE / (x2 - x1))  # x1
    g[1] = int((g[1] - y1) * IMG_SIZE / (y2 - y1))  # y1
    g[2] = int((g[2] - x1) * IMG_SIZE / (x2 - x1))  # x2
    g[3] = int((g[3] - y1) * IMG_SIZE / (y2 - y1))  # y2

    return observable_region, g, (b_prime[0], b_prime[1], b_prime[2], b_prime[3])

说明：

在状态t中，代理正在预测目标对象的位置。目标对象有一个地面真相框（图像中为黄色，虚线中为点状），代理的当前“定位框”为红色边界框。假设在状态t，座席决定最好向右移动。因此，将边界框移到右侧，然后通过在红色边界框周围添加额外的16个像素上下文来确定下一个状态t'，并用原始图像裁剪相对于该边界，然后将裁剪后的图像按比例放大回224、224。

说代理现在确信其预测是准确的，因此它选择了trigger动作。这基本上意味着结束当前目标对象的本地化情节，并在代理预测对象的位置（即红色边框的中间）上放置一个黑叉。现在，由于当前状态是在按照先前的操作进行裁剪之后放大的，因此必须相对于正常/原始/初始图像重新缩放边框，然后才能将黑十字准确地绘制到图像上。

在我遇到的问题中，状态之间的第一个重新缩放效果很好（本文中的第二个代码）。但是，将比例缩放回正常并绘制黑色十字形是我似乎无法理解的事情。

下面是一张有助于说明的图片：