Question

对于联合上的相交（IoU）的计算，我想在以float32 3D张量表示的分割图像image_pred中找到最小值和最大值（边界像素）的坐标。特别是，我的目标是找到图像中对象的左上角和右下角坐标。图像完全由黑色像素（值0.0）组成，除了对象所在的位置，我有彩色像素（0.0 <值<1.0）。这是一个这样的边界框的例子（在我的例子中，对象是交通标志，环境被涂黑）：

到目前为止，我的方法是使用tf.boolean_mask将除彩色像素外的每个像素都设置为False：

zeros = tf.zeros_like(image_pred)
mask = tf.greater(image_pred, zeros)
boolean_mask_pred = tf.boolean_mask(image_pred, mask)

，然后使用tf.where查找被遮罩图像的坐标。为了确定矩形左上角和右下角的水平和垂直坐标值，我考虑过使用tf.recude_max和tf.reduce_min，但是由于它们不会返回如果我提供了axis的单个值，则不能确定这是否是正确的函数。根据文档，如果我未指定axis，则该函数将缩小所有尺寸，这也不是我想要的。哪个是正确的功能？最后的IoU是一个一维浮点值。

coordinates_pred = tf.where(boolean_mask_pred)
x21 = tf.reduce_min(coordinates_pred, axis=1)
y21 = tf.reduce_min(coordinates_pred, axis=0)
x22 = tf.reduce_max(coordinates_pred, axis=1)
y22 = tf.reduce_max(coordinates_pred, axis=0)

Answer 1

您所需要做的就是不使用tf.boolean_mask。首先，我定制了一张类似的图片。

import numpy as np
from matplotlib import pyplot as plt

image = np.zeros(shape=(256,256))
np.random.seed(0)
image[12:76,78:142] = np.random.random_sample(size=(64,64))
plt.imshow(image)
plt.show()

然后通过张量流获得其最大值和最小值的坐标。

import tensorflow as tf

image_pred = tf.placeholder(shape=(256,256),dtype=tf.float32)
zeros = tf.zeros_like(image_pred)
mask = tf.greater(image_pred, zeros)

coordinates_pred = tf.where(mask)
xy_min = tf.reduce_min(coordinates_pred, axis=0)
xy_max = tf.reduce_max(coordinates_pred, axis=0)

with tf.Session() as sess:
    print(sess.run(xy_min,feed_dict={image_pred:image}))
    print(sess.run(xy_max,feed_dict={image_pred:image}))

[12 78]
[ 75 141]

TensorFlow：如何在不包含零的张量中找到分段的最小/最大坐标？

1 个答案: