  1. cv2.imread()与tf.image.decode_jpeg()
  2. cv2.resize()与tf.image.resize_images()
  3. 虽然这些差异会导致准确性降低,但使用plt.imshow()时,图像似乎与人类无法区分。例如,采用ImageNet验证数据集的图像#1:

    CV2 Image enter image description here


    • cv2.imread()接受一个字符串并输出一个BGR 3通道uint8矩阵
    • tf.image_decode_jpeg()接受一个字符串张量并输出RGB 3通道uint8张量。



    [[ 26  41  24 ...,  57  48  46]
     [ 36  39  36 ...,  24  24  29]
     [ 41  26  34 ...,  11  17  27]
     [ 71  67  61 ..., 106 105 100]
     [ 66  63  59 ..., 106 105 101]
     [ 64  66  58 ..., 106 105 101]]```


    [[ 26  42  24 ...,  57  48  48]
     [ 38  40  38 ...,  26  27  31]
     [ 41  28  36 ...,  14  20  31]
     [ 72  67  60 ..., 108 105 102]
     [ 65  63  58 ..., 107 107 103]
     [ 65  67  60 ..., 108 106 102]]```


    • tf.image.resize_images()会自动将uint8张量转换为float32张量,并且似乎会加剧像素值的差异。
    • 我相信tf.image.resize_images()和cv2.resize()都是


    [[  26.           25.41850281   35.73127747 ...,   81.85855103
        59.45834351   49.82373047]
     [  38.33480072   32.90485001   50.90826797 ...,   86.28446198
        74.88543701   20.16353798]
     [  51.27312469   26.86172867   39.52401352 ...,   66.86851501
        81.12111664   33.37636185]
     [  70.59472656   75.78851318 
     45.48100662 ...,   70.18637085
        88.56777191   97.19295502]
     [  70.66964722   59.77249908   48.16699219 ...,   74.25527954
        97.58244324  105.20263672]
     [  64.93395996   59.72298431   55.17600632 ...,   77.28720856
        98.95108032  105.20263672]]```


    [[ 36  30  34 ..., 102  59  43]
     [ 35  28  51 ...,  85  61  26]
     [ 28  39  50 ...,  59  62  52]
     [ 75  67  34 ...,  74  98 101]
     [ 67  59  43 ...,  86 102 104]
     [ 66  65  48 ...,  86 103 105]]```



    • 为什么cv2.imread()和tf.image.decode_jpeg()的输出不同?
    • 如果使用相同的插值方案,cv2.resize()和tf.image.resize_images()有何不同?


As vijay m points out correctly, by changing the dct_method to "INTEGER_ACCURATE" you will get the same uint8 image using cv2 or tf. The problem indeed seems to be the resizing method. I also tried to force Tensorflow to use the same interpolation method than cv2 uses by default (bilinear) but the results are still different. This might be the case, because cv2 does the interpolation on integer values and TensorFlow converts to float before interpolating. But this is only a guess. If you plot the pixel-wise difference between the resized image by TF and cv2 you'll get the following historgram:

Histrogramm of pixel-wise difference

As you can see, this looks pretty normal distributed. (Also I was surprised amount of pixel-wise difference). The problem of your accuracy drop could lie exactly here. In this paper Goodfellow et al. describe the effect of adversarial examples and classification systems. This problem here is something similar I think. If the original weights you use for your network were trained using some input pipeline which gives the results of the cv2 functions, the image from the TF input pipeline is something like an adversarial example.

(See the image on page 3 at the top for an example...I can't post more than two links.)

So in the end I think if you want to use the original network weights for the same data they trained the network on, you should stay with a similar/same input pipeline. If you use the weights to finetune the network on your own data, this should not be of a big concern, because you retrain the classification layer to work with the new input images (from the TF pipeline).

And @ Ishant Mrinal: Please have a look at the code the OP provided in the GIST. He is aware of the difference of BGR (cv2) and RGB (TF) and is converting the images to the same color space.