Question

我最近使用cv2为Tensorflow的tf.image模块进行图像处理。但是，我的验证准确率下降了大约10％。

我认为这个问题与

有关

cv2.imread（）与tf.image.decode_jpeg（）
cv2.resize（）与tf.image.resize_images（）

虽然这些差异会导致准确性降低，但使用plt.imshow（）时，图像似乎与人类无法区分。例如，采用ImageNet验证数据集的图像＃1：

第一期：

cv2.imread（）接受一个字符串并输出一个BGR 3通道uint8矩阵
tf.image_decode_jpeg（）接受一个字符串张量并输出RGB 3通道uint8张量。

然而，在将tf张量转换为BGR格式之后，图像中的许多像素存在非常小的差异。

使用tf.image.decode_jpeg然后转换为BGR

[[ 26  41  24 ...,  57  48  46]
 [ 36  39  36 ...,  24  24  29]
 [ 41  26  34 ...,  11  17  27]
 ..., 
 [ 71  67  61 ..., 106 105 100]
 [ 66  63  59 ..., 106 105 101]
 [ 64  66  58 ..., 106 105 101]]```

使用cv.imread

[[ 26  42  24 ...,  57  48  48]
 [ 38  40  38 ...,  26  27  31]
 [ 41  28  36 ...,  14  20  31]
 ..., 
 [ 72  67  60 ..., 108 105 102]
 [ 65  63  58 ..., 107 107 103]
 [ 65  67  60 ..., 108 106 102]]```

第二期：

tf.image.resize_images（）会自动将uint8张量转换为float32张量，并且似乎会加剧像素值的差异。
我相信tf.image.resize_images（）和cv2.resize（）都是

tf.image.resize_images

[[  26.           25.41850281   35.73127747 ...,   81.85855103
    59.45834351   49.82373047]
 [  38.33480072   32.90485001   50.90826797 ...,   86.28446198
    74.88543701   20.16353798]
 [  51.27312469   26.86172867   39.52401352 ...,   66.86851501
    81.12111664   33.37636185]
 ..., 
 [  70.59472656   75.78851318 
 45.48100662 ...,   70.18637085
    88.56777191   97.19295502]
 [  70.66964722   59.77249908   48.16699219 ...,   74.25527954
    97.58244324  105.20263672]
 [  64.93395996   59.72298431   55.17600632 ...,   77.28720856
    98.95108032  105.20263672]]```

cv2.resize

[[ 36  30  34 ..., 102  59  43]
 [ 35  28  51 ...,  85  61  26]
 [ 28  39  50 ...,  59  62  52]
 ..., 
 [ 75  67  34 ...,  74  98 101]
 [ 67  59  43 ...,  86 102 104]
 [ 66  65  48 ...,  86 103 105]]```

这里有一个gist，展示了刚才提到的行为。它包含我处理图像的完整代码。

所以我的主要问题是：

为什么cv2.imread（）和tf.image.decode_jpeg（）的输出不同？
如果使用相同的插值方案，cv2.resize（）和tf.image.resize_images（）有何不同？

谢谢！

Answer 1

As vijay m points out correctly, by changing the dct_method to "INTEGER_ACCURATE" you will get the same uint8 image using cv2 or tf. The problem indeed seems to be the resizing method. I also tried to force Tensorflow to use the same interpolation method than cv2 uses by default (bilinear) but the results are still different. This might be the case, because cv2 does the interpolation on integer values and TensorFlow converts to float before interpolating. But this is only a guess. If you plot the pixel-wise difference between the resized image by TF and cv2 you'll get the following historgram:

Histrogramm of pixel-wise difference

As you can see, this looks pretty normal distributed. (Also I was surprised amount of pixel-wise difference). The problem of your accuracy drop could lie exactly here. In this paper Goodfellow et al. describe the effect of adversarial examples and classification systems. This problem here is something similar I think. If the original weights you use for your network were trained using some input pipeline which gives the results of the cv2 functions, the image from the TF input pipeline is something like an adversarial example.

(See the image on page 3 at the top for an example...I can't post more than two links.)

So in the end I think if you want to use the original network weights for the same data they trained the network on, you should stay with a similar/same input pipeline. If you use the weights to finetune the network on your own data, this should not be of a big concern, because you retrain the classification layer to work with the new input images (from the TF pipeline).

And @ Ishant Mrinal: Please have a look at the code the OP provided in the GIST. He is aware of the difference of BGR (cv2) and RGB (TF) and is converting the images to the same color space.

CV2图像处理和tf.image处理之间的差异

1 个答案: