Question

我正在研究可用于fast-rcnn的ROI池化层，并且我习惯于使用tensorflow。我发现tf.image.crop_and_resize可以充当ROI池层。

但是我尝试了很多次却无法获得预期的结果。还是真正的结果正是我所得到的？

这是我的代码

import cv2
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt 

img_path = r'F:\IMG_0016.JPG'
img = cv2.imread(img_path)
img = img.reshape([1,580,580,3])
img = img.astype(np.float32)
#img = np.concatenate([img,img],axis=0)

img_ = tf.Variable(img) # img shape is [580,580,3]
boxes = tf.Variable([[100,100,300,300],[0.5,0.1,0.9,0.5]])
box_ind = tf.Variable([0,0])
crop_size = tf.Variable([100,100])

#b = tf.image.crop_and_resize(img,[[0.5,0.1,0.9,0.5]],[0],[50,50])
c = tf.image.crop_and_resize(img_,boxes,box_ind,crop_size)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
a = c.eval(session=sess)

plt.imshow(a[0])
plt.imshow(a[1])

然后我交出了我的来历img和结果：a0，a1
如果我错了，谁能教我如何使用此功能？谢谢。

Answer 1

似乎tf.image.crop_and_resize期望像素值在[0,1]范围内。

将代码更改为

test = tf.image.crop_and_resize(image=image_np_expanded/255., ...)

为我解决了这个问题。

Answer 2

实际上，这里的Tensorflow没问题。

来自tf.image.crop_and_resize的{{3}}（强调是我的）：

boxes：类型为float32的张量。形状为[num_boxes，4]的二维张量。张量的第i行指定框中的坐标 box_ind [i]图片，并在归一化坐标中指定[y1，x1， y2，x2]。 y的标准化坐标值映射到图像坐标为y *（image_height-1），因此[0，1]的间隔为归一化的图像高度映射到图像中的[0，image_height-1] 高度坐标。我们确实允许y1> y2，在这种情况下，裁剪是原始图像的上下翻转版本。宽度维度的处理方式相似。 [0， 1]范围是允许的，在这种情况下，我们使用extrapolation_value 推断输入的图像值。

box参数需要归一化的坐标。这就是为什么您会得到一个黑匣子，其中包含第一组坐标[100,100,300,300]（未规范化，没有提供外推值），而不是第二组坐标[0.5,0.1,0.9,0.5]。

但是，这就是为什么matplotlib在第二次尝试时会显示乱码的原因，这仅仅是因为您使用了错误的数据类型。引用plt.imshow中的matplotlib doc（强调是我的）：

对于浮点数，所有值都应在[0 .. 1]范围内；对于[0 .. 255] 用于整数。 超出范围的值将被限制在这些范围内。

当您使用[0,1]范围之外的float时，matplotlib会将您的值限制为1。这就是为什么您得到那些彩色像素（纯红色，纯绿色或纯蓝色，或这些像素的混合）的原因。将您的数组强制转换为uint_8以获得有意义的图像。

plt.imshow( a[1].astype(np.uint8))

编辑： 根据要求，我将进一步介绍 tf.image.crop_and_resize。

[当提供非归一化坐标且没有外推值时]，为什么我只得到空白结果？

引用文档：

允许[0，1]范围之外的归一化坐标，其中情况下，我们使用extrapolation_value外推输入的图像值。

因此，允许[0,1]以外的归一化坐标。但是它们仍然需要规范化！在示例[100,100,300,300]中，您提供的坐标为红色正方形。您的原始图片是左上角的小绿点！自变量extrapolation_value的默认值为0，因此推断原始图像帧外的值为[0,0,0]，因此为黑色。
documentation

但是，如果您的用例需要另一个值，则可以提供它。像素在每个通道上的RGB值为extrapolation_value%256。如果您要裁剪的区域未完全包含在原始图像中，则此选项很有用。（例如，一个可能的用例是滑动窗口）。

Answer 3

另一个变种是使用tf.central_crop函数。

Answer 4

下面是tf.image.crop_and_resize API的具体实现。 tf版本1.14

import tensorflow as tf
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np

tf.enable_eager_execution()

def single_data_2(img_path):
    img = tf.read_file(img_path)
    img = tf.image.decode_bmp(img,channels=1)
    img_4d = tf.expand_dims(img, axis=0)
    processed_img = tf.image.crop_and_resize(img_4d,boxes= 
                   [[0.4529,0.72,0.4664,0.7358]],crop_size=[64,64],box_ind=[0])
    processed_img_2 = tf.squeeze(processed_img,0)
    raw_img_3 = tf.squeeze(img_4d,0)
    return raw_img_3, processed_img_2

def plot_two_image(raw,processed):
    fig=plt.figure(figsize=(35,35))
    raw_ = fig.add_subplot(1,2,1)
    raw_.set_title('Raw Image')
    raw_.imshow(raw,cmap='gray')
    processed_ = fig.add_subplot(1,2,2)
    processed_.set_title('Processed Image')
    processed_.imshow(processed,cmap='gray')

img_path = 'D:/samples/your_bmp_image.bmp'

raw_img, process_img  = single_data_2(img_path)
print(raw_img.dtype,process_img.dtype)
print(raw_img.shape,process_img.shape)
raw_img=tf.squeeze(raw_img,-1)
process_img=tf.squeeze(process_img,-1)
print(raw_img.dtype,process_img.dtype)
print(raw_img.shape,process_img.shape)
plot_two_image(raw_img,process_img)

关于使用tf.image.crop_and_resize

4 个答案: