Question

我正在浏览掩码RCNN给出here的幻灯片，但是在应用ROI Align之后无法计算出要素图，如下图所示，论文和幻灯片提到使用Bi - 线性插值，但我不知道在给定图像中如何做到这一点。感谢

RoIAlign (Mask R-CNN)

Answer 1

将4个点放置在每个池单元内后，将使用最接近它的4个像素使用双线性插值法确定每个点的值。一旦为每个点都有一个值，就可以取每个池单元中4个点的平均值或最大值。您将该值放入输出张量内的相应位置，可以进行正向操作，向后操作也不应该成为问题。

例如，在图像中，第一个红点被0.85、0.34、0.32和0.74值像素包围，结果值是以下函数：

这些值
红点到这些像素（其中心）的距离

距像素最近，其值距相应像素值最近。

Answer 2

Also check this implementation

#From Mask R-CNN paper: "We sample four regular locations, so
        # that we can evaluate either max or average pooling. In fact,
        # interpolating only a single value at each bin center (without
        # pooling) is nearly as effective."
        #
        # Here we use the simplified approach of a single value per bin,
        # which is how it's done in tf.crop_and_resize()
        # Result: [batch * num_boxes, pool_height, pool_width, channels]

应用ROI Align后如何计算要素图，如Mask RCNN Paper中所述？

2 个答案: