Question

我在tensorflow API中找到了以下方法tf.extract_image_patches，但我不清楚它的功能。

说出div{ background-image: url("https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png"); height: 100px; width: 600px; border: 2px solid green; }，图片大小为batch_size = 1，我们想要提取大小为225x225x3的小标签。

此功能的确如何表现？具体来说，文档提到输出张量的维度为32x32，但未提及[batch, out_rows, out_cols, ksize_rows * ksize_cols * depth]和out_rows的内容。

理想情况下，给定一个大小为out_cols的输入图像张量（其中1是批量大小），我希望能够得到1x225x225x3作为输出，其中Kx32x32x3是总数补丁数量和K是每个补丁的维度。张量流中有什么东西已经实现了吗？

Answer 1

以下是该方法的工作原理：

ksizes用于确定每个补丁的尺寸，换句话说，每个补丁应包含的像素数。
strides表示原始图像中一个色块的开头与下一个连续色块的开始之间的间隙长度。
rates是一个数字，实际上意味着我们的补丁应该在原始图像中跳过rates像素，这些像素最终会出现在我们的补丁中。（以下示例有助于说明这一点。）
padding是＆＃34; VALID＆＃34;，这意味着每个补丁必须完全包含在图像中，或者＃34; SAME＆＃34;，这意味着补丁可能不完整（剩下的像素将用零填充。

以下是一些带有输出的示例代码，以帮助演示其工作原理：

import tensorflow as tf

n = 10
# images is a 1 x 10 x 10 x 1 array that contains the numbers 1 through 100 in order
images = [[[[x * n + y + 1] for y in range(n)] for x in range(n)]]

# We generate four outputs as follows:
# 1. 3x3 patches with stride length 5
# 2. Same as above, but the rate is increased to 2
# 3. 4x4 patches with stride length 7; only one patch should be generated
# 4. Same as above, but with padding set to 'SAME'
with tf.Session() as sess:
  print tf.extract_image_patches(images=images, ksizes=[1, 3, 3, 1], strides=[1, 5, 5, 1], rates=[1, 1, 1, 1], padding='VALID').eval(), '\n\n'
  print tf.extract_image_patches(images=images, ksizes=[1, 3, 3, 1], strides=[1, 5, 5, 1], rates=[1, 2, 2, 1], padding='VALID').eval(), '\n\n'
  print tf.extract_image_patches(images=images, ksizes=[1, 4, 4, 1], strides=[1, 7, 7, 1], rates=[1, 1, 1, 1], padding='VALID').eval(), '\n\n'
  print tf.extract_image_patches(images=images, ksizes=[1, 4, 4, 1], strides=[1, 7, 7, 1], rates=[1, 1, 1, 1], padding='SAME').eval()

输出：

[[[[ 1  2  3 11 12 13 21 22 23]
   [ 6  7  8 16 17 18 26 27 28]]

  [[51 52 53 61 62 63 71 72 73]
   [56 57 58 66 67 68 76 77 78]]]]


[[[[  1   3   5  21  23  25  41  43  45]
   [  6   8  10  26  28  30  46  48  50]]

  [[ 51  53  55  71  73  75  91  93  95]
   [ 56  58  60  76  78  80  96  98 100]]]]


[[[[ 1  2  3  4 11 12 13 14 21 22 23 24 31 32 33 34]]]]


[[[[  1   2   3   4  11  12  13  14  21  22  23  24  31  32  33  34]
   [  8   9  10   0  18  19  20   0  28  29  30   0  38  39  40   0]]

  [[ 71  72  73  74  81  82  83  84  91  92  93  94   0   0   0   0]
   [ 78  79  80   0  88  89  90   0  98  99 100   0   0   0   0   0]]]]

因此，例如，我们的第一个结果如下所示：

 *  *  *  4  5  *  *  *  9 10 
 *  *  * 14 15  *  *  * 19 20 
 *  *  * 24 25  *  *  * 29 30 
31 32 33 34 35 36 37 38 39 40 
41 42 43 44 45 46 47 48 49 50 
 *  *  * 54 55  *  *  * 59 60 
 *  *  * 64 65  *  *  * 69 70 
 *  *  * 74 75  *  *  * 79 80 
81 82 83 84 85 86 87 88 89 90 
91 92 93 94 95 96 97 98 99 100

如您所见，我们有2行和2列的修补程序，这些是out_rows和out_cols。

Answer 2

为了扩展Neal的详细答案，使用＆＃34; SAME＆＃34;时有很多细微差别，零填充，因为如果可能，extract_image_patches会尝试将图片中的补丁居中。根据步幅，顶部和左侧可能有填充，或者没有填充，第一个补丁不必从左上角开始。

例如，扩展前一个例子：

print tf.extract_image_patches(images, [1, 3, 3, 1], [1, n, n, 1], [1, 1, 1, 1], 'SAME').eval()[0]

当步幅为n = 1时，图像用零填充，第一个补丁以填充开始。其他步幅仅在右侧和底部填充图像，或者根本不填充图像。在n = 10的步幅中，单个补丁从元素34开始（在图像的中间）。

tf.extract_image_patches由特征库实现，如this answer中所述。您可以研究该代码以确切了解补丁位置和填充的计算方式。

Answer 3

简介

在这里，我想展示一个相当简单的演示，以将 tf.image.extract_patches 与图像本身结合使用。我发现该方法的实现量相当小，使用具有适当可视化的实际图像，所以就在这里。

我们将使用的图像大小为 (256, 256, 3)。我们将提取的补丁的形状为 (128, 128, 3)。这意味着我们将从图像中检索 4 个图块。

使用的数据

我将使用 flowers dataset。由于此答案需要一些数据管道，因此我将在此处链接我的 kaggle kernel，其中讨论了使用 tf.data.Dataset API 使用数据集。

在我们完成后，我们将浏览以下代码片段。

images, _ = next(iter(train_ds.take(1)))

image = images[0]
plt.imshow(image.numpy().astype("uint8"))

这里我们从一批图像中取出一张图像并按原样对其进行可视化。

image = tf.expand_dims(image,0) # To create the batch information
patches = tf.image.extract_patches(images=image,
                                   sizes=[1, 128, 128, 1],
                                   strides=[1, 128, 128, 1],
                                   rates=[1, 1, 1, 1],
                                   padding='VALID')

通过这个片段，我们从大小为 (256,256) 的图像中提取大小为 (128,128) 的块。这直接转化为我希望将图像分成 4 个图块的事实。

可视化

plt.figure(figsize=(10, 10))
for imgs in patches:
    count = 0
    for r in range(2):
        for c in range(2):
            ax = plt.subplot(2, 2, count+1)
            plt.imshow(tf.reshape(imgs[r,c],shape=(128,128,3)).numpy().astype("uint8"))
            count += 1

了解tf.extract_image_patches以从图像中提取补丁

3 个答案:

简介

使用的数据

可视化