Question

我正在尝试使用带有Tensorflow-GPU后端的Keras从paper, describing Grad-CAM method复制一些结果，并获得完全不正确的标签。

我已经从该论文中捕获了图1（a）的屏幕截图，并试图从Keras Applications中进行预训练的VGG16进行分类。

这是我的照片：

这是我的代码（来自Jupyter笔记本的单元格）。部分代码是从Keras manuals

复制而来的

import imageio
from matplotlib import pyplot as plt
from skimage.transform import resize

from keras import activations
from keras.applications import VGG16
from keras.applications.vgg16 import preprocess_input, decode_predictions

# Build the VGG16 network with ImageNet weights
model = VGG16(weights='imagenet', include_top=True)

%matplotlib inline

dog_img = imageio.imread(r"F:\tmp\Opera Snapshot_2018-09-24_133452_arxiv.org.png")
dog_img = dog_img[:, :, 0:3]   # Opera has added alpha channel
dog_img = resize(dog_img, (224, 224, 3))

x = np.expand_dims(dog_img, axis=0)
x = preprocess_input(x, mode='tf')

pred = model.predict(x)
decode_predictions(pred)

输出：

[[('n03788365', 'mosquito_net', 0.017053505),
  ('n03291819', 'envelope', 0.015034639),
  ('n15075141', 'toilet_tissue', 0.012603286),
  ('n01737021', 'water_snake', 0.010620943),
  ('n04209239', 'shower_curtain', 0.009625845)]]

但是，当我将相同的图像提交给在线论文作者http://gradcam.cloudcv.org/classification时，我会看到正确的标签“义和团”

这是他们称为“端子”的输出：

Completed the Classification Task

"Time taken for inference in torch: 9.0"
"Total time taken: 9.12565684319"
{"classify_gcam": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gcam_243.png", "execution_time": 9.0, "label": 243.0, "classify_gb_gcam": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gb_gcam_243.png", "classify_gcam_raw": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gcam_raw_243.png", "input_image": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/Opera Snapshot_2018-09-24_133452_arxiv.org.png", "pred_label": 243.0, "classify_gb": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gb_243.png"}
Completed the Classification Task

"Time taken for inference in torch: 9.0"
"Total time taken: 9.05940508842"
{"classify_gcam": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gcam_243.png", "execution_time": 9.0, "label": 243.0, "classify_gb_gcam": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gb_gcam_243.png", "classify_gcam_raw": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gcam_raw_243.png", "input_image": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/Opera Snapshot_2018-09-24_133452_arxiv.org.png", "pred_label": 243.0, "classify_gb": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gb_243.png"}
Job published successfully
Publishing job to Classification Queue
Starting classification job on VGG_ILSVRC_16_layers.caffemodel
Job published successfully
Publishing job to Classification Queue
Starting classification job on VGG_ILSVRC_16_layers.caffemodel

我在Windows 7上使用Anaconda Python 64位。

我的PC上相关软件的版本：

keras                     2.2.2                         0
keras-applications        1.0.4                    py36_1
keras-base                2.2.2                    py36_0
keras-preprocessing       1.0.2                    py36_1
tensorflow                1.10.0          eigen_py36h849fbd8_0
tensorflow-base           1.10.0          eigen_py36h45df0d8_0

我在做什么错？如何获得拳击手标签？

Answer 1

您显然无法执行以下行

dog_img = dog_img[:, :, 0:3]   # Opera has added alpha channel

因此，我使用Keras中名为load_img的实用程序加载了图像，该实用程序未添加Alpha通道。

完整代码

import imageio
from matplotlib import pyplot as plt
from skimage.transform import resize
import numpy as np
from keras import activations
from keras.applications import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input, decode_predictions

# Build the VGG16 network with ImageNet weights
model = VGG16(weights='imagenet', include_top=True)
dog_img = image.img_to_array(image.load_img(r"F:\tmp\Opera Snapshot_2018-09-24_133452_arxiv.org.png", target_size=(224, 224)))

x = np.expand_dims(dog_img, axis=0)
x = preprocess_input(x)

pred = model.predict(x)
print(decode_predictions(pred))

[[('n02108089', 'boxer', 0.29122102), ('n02108422', 'bull_mastiff', 0.199128), ('n02129604', 'tiger', 0.10050287), ('n02123159', 'tiger_cat', 0.09733449), ('n02109047', 'Great_Dane', 0.056869864)]]

Answer 2

考虑到所有输出概率都非常低，并且大致均等地分布在0.01左右，我的猜测是您对图像进行了错误的预处理，并将某种看起来像噪声的加扰图像传递给model.predict() 。在imshow之前尝试调试和predict()镜像。

在复制纸张结果时Keras VGG16的精度差

2 个答案: