Question

我无法在数据集中找到一些问题图像。

我的模型开始训练，但是出现以下错误：

tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid PNG data, size 135347
         [[{{node case/cond/cond_jpeg/decode_image/cond_jpeg/cond_png/DecodePng}} = DecodePng[channels=3, dtype=DT_UINT8, _device="/device:CPU:0"](case/cond/cond_jpeg/decode_image/cond_jpeg/cond_png/cond_gif/DecodeGif/Switch:1, ^case/Assert/AssertGuard/Merge)]]
         [[node IteratorGetNext (defined at object_detection/model_main.py:105)  = IteratorGetNext[output_shapes=[[24], [24,300,300,3], [24,2], [24,3], [24,100], [24,100,4], [24,100,2], [24,100,2], [24,100], [24,100], [24,100], [24]], output_types=[DT_INT32, DT_FLOAT, DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_BOOL, DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorV2)]]

因此，我编写了一个小脚本，该脚本在生成TFRecords之前运行，以尝试捕获任何问题图像。这基本上是本教程的代码，但批处理大小为1。这是我尝试捕获错误的最简单的方法。

def preprocess_image(image):
    image = tf.image.decode_png(image, channels=3)
    image = tf.image.resize_images(image, [192, 192])
    image /= 255.0  # normalize to [0,1] range

    return image

def load_and_preprocess_image(path):
    image = tf.read_file(path)
    return preprocess_image(image)

mobile_net = tf.keras.applications.MobileNetV2(input_shape=(192, 192, 3), include_top=False)
mobile_net.trainable=False

path_ds = tf.data.Dataset.from_tensor_slices(images)

image_ds = path_ds.map(load_and_preprocess_image, num_parallel_calls=4)

def change_range(image):
    return (2*image-1)

keras_ds = image_ds.map(change_range)
keras_ds = keras_ds.batch(1)

for i, batch in tqdm(enumerate(iter(keras_ds))):
    try:
        feature_map_batch = mobile_net(batch)
    except KeyboardInterrupt:
        break
    except:
        print(images[i])

这会导致正常崩溃，但未正确处理异常。它只是引发异常并崩溃。有两个问题：

有没有办法强制我正确处理它？似乎没有Tensorflow, try and except doesn't handle exception
是否有更好的方法来查找损坏的输入？

我隔离了一个失败的映像，但是OpenCV，SciPy，Matplotlib和Skimage都将其打开。例如，我已经尝试过：

import scipy
images = images[1258:]
print(scipy.misc.imread(images[0]))

import matplotlib.pyplot as plt
print(plt.imread(images[0]))

import cv2
print(cv2.imread(images[0]))

import skimage
print(skimage.io.imread(images[0]))

... try to run inference in Tensorflow

我打印出四个矩阵。我认为这些库都使用libpng或类似的东西。

然后图像1258崩溃Tensorflow。看着DecodePng source，看起来实际上是在使TF png library崩溃。

我意识到我可能可以编写自己的数据加载器，但这似乎很麻烦。

编辑：

这也可以用作摘要：

tf.enable_eager_execution()

for i, image in enumerate(images):
    try:
        with tf.gfile.GFile(image, 'rb') as fid:
            image_data = fid.read()

        image_tensor = tf.image.decode_png(
                        image_data,
                        channels=3,
                        name=None
                    )
    except:
        print("Failed: ", i, image_tensor)

Answer 1

打开一个新的python文件。复制以下代码。指定图片所在的目录。并运行代码。您可以在列表中看到Corrupt JPEG data: premature end of data segment消息（如果文件已损坏）。

from os import listdir
    import cv2

    #for filename in listdir('C:/tensorflow/models/research/object_detection/images/train'):
    for filename in listdir(yourDirectory):
      if filename.endswith(".jpg"):
        print(yourDirectory+filename)
        #cv2.imread('C:/tensorflow/models/research/object_detection/images/train/'+filename)
        cv2.imread(yourDirectory+filename)

Answer 2

这个问题的答案很晚而且出乎意料。

问题出在（很可能是）RAM错误。在Linux中发生了一些奇怪的事情后，例如文件系统变为只读，而Firefox中的随机选项卡崩溃，我决定运行Memtest。我安装了2x8GB DIMM。原来，在4GB标记附近（两根棍子上）都存在一个坏块，这意味着只有在（a）系统负载很高时才会出现错误，（b）如果使用率超过8GB时才出现错误。我还检查了硬盘坏的情况，但这是一个相当新的SSD。以前，我在使用同一系统的Windows上进行过零星的随机重启，但是我再次假设这只是Microsoft强制进行更新。

因此，我将其张贴在此处以供后代参考。如果您看到奇怪的情况，例如图像以不可重复的方式损坏，则需要几分钟来运行Memtest作为健全性检查。严重错误应在30秒内弹出，值得一整夜（多次通过）进行仔细检查。

上面发布的解决方案仍然有用，并且我仍然不相信TF推出他们自己的PNG加载器，但是始终值得检查您的硬件！

在Tensorflow中检测损坏的图像

2 个答案: