Question

我正在尝试在TensorFlow中的图像边界框内填充零。具体来说，我正在尝试在以下代码中实现create_mask_from_bounding_boxes(image, boxes)函数。

# Tensor <?, 4>, where each element contains [ymin, xmin, ymax, xmax]
boxes

# Tensor <H, W, C>
image

# Tensor <H, W, C>
mask = create_mask_from_bounding_boxes(image, boxes)

# mask out bounding boxes in the image
bounding_box_masked_image = mask * image

如果盒子的数量可用，我会做这样的事情：

def create_mask_from_bounding_boxes(image, boxes):
    mask = tf.zeros_like(image)
    for box in tf.unstack(boxes):
        ymin, xmin, ymax, xmax = tf.unstack(box)
        mask[ymin:ymax, xmin:xmax] = 1
    return mask

但是，由于框数不可用，我无法使用tf.unstack(boxes)。还有其他方法可以在TensorFlow中使用未知数量的边界框创建图像蒙版吗？

Answer 1

一般答案“我身形不明时该怎么办？”问题是“使用TensorArray”。 TensorArray提供了一种处理静态未知数量的张量的方法。

以下是解决您具体问题的几种方法：

import tensorflow as tf
import numpy as np

USE_FOLD = True

def box_mask(box):
  """Create a 4x4 tensor of zeros except for a rectangle of ones defined by `box`"""
  x, y = 4, 4
  ymin, xmin, ymax, xmax = tf.unstack(box)
  h = xmax - xmin
  z0 = tf.zeros([xmin, y])
  z1 = tf.concat(
      [tf.zeros([h, ymin]),
       tf.ones([h, ymax - ymin]),
       tf.zeros([h, y - ymax])],
      axis=1)
  z2 = tf.zeros([x - xmax, y])
  return tf.concat([z0, z1, z2], axis=0)

def reduce_mask(a, box):
  mask = box_mask(box)
  return tf.maximum(a, mask)

def main():
  boxes_val = np.array([[0, 0, 2, 2], [2, 2, 4, 4]])
  boxes = tf.placeholder(shape=(None, 4), dtype=tf.int32)

  with tf.Session() as sess:
    if USE_FOLD:
      print sess.run(tf.foldl(reduce_mask, boxes,
                              initializer=tf.zeros([4,4])),
                     feed_dict={boxes: boxes_val})
    else:
      masks = tf.map_fn(box_mask, boxes, dtype=tf.float32)
      combined_mask = tf.reduce_max(masks, axis=0)
      print sess.run(combined_mask, feed_dict={boxes: boxes_val})

为简单起见，我将图像大小硬编码为4x4。两种方法都使用功能原语：tf.map_fn和tf.foldl。这些原语建立在tf.while_loop和TensorArray之上。使用USE_FOLD-True的方法可能会更慢，因为每个框都会转换为蒙版并按顺序应用于当前蒙版，但需要更少的内存 - 与框的数量无关。使用USE_FOLD=False的方法可以并行执行从框到掩码的转换，然后执行单个缩小步骤（or将所有蒙版组合在一起）。但是，它需要与image_size * num_boxes成比例的内存。

在这个例子中，内存与速度的讨论可能没什么问题，因为从盒子到掩码的转换非常快。但如果计算的“地图部分”很昂贵，则可能很重要。

如何从未知数量的边界框创建图像蒙版张量？

1 个答案: