Question

我尝试使用FCN（完全卷积网络），并尝试重现原始论文中报告的结果（Long et al.CVPR＆＃39; 15）。

在该论文中，作者报告了PASCAL VOC数据集的结果。下载并解开2012年的train-val数据集后（http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar ），我注意到 SegmentationClass 中有2913个png文件， SegmentationObject 子目录中有相同数量的文件。

这些png文件中的像素值似乎是32的倍数（例如0,128,192,224 ......），它们不在0到20之间.I＆＃39; m只是想知道像素值和像素的地面实况标签之间的对应关系。或者我在查看错误的文件？

Answer 1

原始问题中提到的值看起来像“颜色图”值，可以通过getpalette()模块的PIL Image函数获得。

对于VOC图像的带注释的值，我使用以下代码片段进行检查：

import numpy as np
from PIL import Image

files = [ 
        'SegmentationObject/2007_000129.png',
        'SegmentationClass/2007_000129.png',
        'SegmentationClassRaw/2007_000129.png', # processed by _remove_colormap()
                                                # in captainst's answer...
        ]

for f in files:
    img = Image.open(f)
    annotation = np.array(img)
    print('\nfile: {}\nanno: {}\nimg info: {}'.format(
        f, set(annotation.flatten()), img))

代码中使用的三个图像如下所示（分别从左到右）：

代码的相应输出如下：

file: SegmentationObject/2007_000129.png
anno: {0, 1, 2, 3, 4, 5, 6, 255}
img info: <PIL.PngImagePlugin.PngImageFile image mode=P size=334x500 at 0x7F59538B35F8>

file: SegmentationClass/2007_000129.png
anno: {0, 2, 15, 255}
img info: <PIL.PngImagePlugin.PngImageFile image mode=P size=334x500 at 0x7F5930DD5780>

file: SegmentationClassRaw/2007_000129.png
anno: {0, 2, 15, 255}
img info: <PIL.PngImagePlugin.PngImageFile image mode=L size=334x500 at 0x7F5930DD52E8>

从以上输出中我学到了两件事。

首先， SegmentationObject 文件夹中图像的注释值由对象数分配。在这种情况下，有3个人和3辆自行车，并且注释的值是从1到6 。但是，对于 SegmentationClass 文件夹中的图像，其值由对象的类值分配。所有的人都属于 15级，所有的自行车都是 2级。

其次，如 mkisantal 所述，在np.array()操作之后，调色板被删除（我通过观察结果来“知道”它，但是我仍然不明白引擎盖下的机制...）。我们可以通过检查输出的image mode来确认这一点：

SegmentationObject/2007_000129.png和SegmentationClass/2007_000129.png都有image mode=P，而
SegmentationClassRaw/2007_000129.png具有image mode=L。（参考：The modes of PIL Image）

Answer 2

我知道这个问题是前一段时间提出的。但是在尝试使用tensorflow deeplab的PASCAL VOC 2012时，我提出了类似的问题。

如果您查看file_download_and_convert_voc2012.sh，则有几行标有“ ＃删除地面实况注释中的颜色图”。这部分处理原始的SegmentationClass文件，并生成原始的分段图像文件，每个像素值在0:20之间。（如果您可能会问为什么，请查看此帖子：Python: Use PIL to load png file gives strange results）

请注意此魔术功能：

def _remove_colormap(filename):
  """Removes the color map from the annotation.

  Args:
    filename: Ground truth annotation filename.

  Returns:
    Annotation without color map.
  """
  return np.array(Image.open(filename))

我不得不承认我对

的操作不完全了解

np.array(Image.open(filename))

下面我为您的裁判显示了一组图像（从上到下：原始图像，细分类别和细分原始类别）

Answer 3

只需下载Pascal VOC。数据集中的像素值如下：

0：背景
[1 .. 20]间隔：分段的对象，类[飞机，...，电视监视器]
255：无效类别，用于边界区域（5像素）并掩盖困难对象

您可以在数据集here中找到更多信息。

队长先前的回答讨论了用调色板保存的png文件，我认为这与原始问题无关。链接的tensorflow代码仅加载使用颜色图（调色板）保存的png，然后将其转换为numpy数组（在此步骤中，调色板丢失），然后再次将数组另存为png。在此过程中，不会更改数值，只会删除调色板。

PASCAL VOC中的地面实况像素标签用于语义分割

3 个答案: