Question

我对TensorFlow和图像分类还很陌生，所以我可能缺少关键知识，这可能就是我面临此问题的原因。

我已经使用ResNet50库在TensorFlow中构建了ImageNet模型，用于对狗品种进行图像分类，并且我已经成功地训练了可以检测各种狗品种的神经网络。

>

我现在想将一只狗的随机图像传递给我的模型，以便它吐出关于它认为狗的品种的输出。但是，当我运行此功能dog_breed_predictor("<file path to image>")时，尝试执行第expected global_average_pooling2d_1_input to have shape (1, 1, 2048) but got array with shape (7, 7, 2048)行时出现错误Resnet50_model.predict(bottleneck_feature)，我不知道该如何解决。

这是代码。我提供了我认为与问题有关的所有信息。

import cv2
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from tqdm import tqdm

from sklearn.datasets import load_files
np_utils = tf.keras.utils

# define function to load train, test, and validation datasets
def load_dataset(path):
    data = load_files(path)
    dog_files = np.array(data['filenames'])
    dog_targets = np_utils.to_categorical(np.array(data['target']), 133)
    return dog_files, dog_targets

# load train, test, and validation datasets
train_files, train_targets = load_dataset('dogImages/dogImages/train')
valid_files, valid_targets = load_dataset('dogImages/dogImages/valid')
test_files, test_targets = load_dataset('dogImages/dogImages/test')

#define Resnet50 model
Resnet50_model = ResNet50(weights="imagenet")

def path_to_tensor(img_path):
    #loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(224, 224))
    #convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
    x = image.img_to_array(img)
    #convert 3D tensor into 4D tensor with shape (1, 224, 224, 3)
    return np.expand_dims(x, axis=0)

from keras.applications.resnet50 import preprocess_input, decode_predictions

def ResNet50_predict_labels(img_path):
    #returns prediction vector for image located at img_path
    img = preprocess_input(path_to_tensor(img_path))
    return np.argmax(Resnet50_model.predict(img))

###returns True if a dog is detected in the image stored at img_path
def dog_detector(img_path):
    prediction = ResNet50_predict_labels(img_path)
    return ((prediction <= 268) & (prediction >= 151))

###Obtain bottleneck features from another pre-trained CNN
bottleneck_features = np.load("bottleneck_features/DogResnet50Data.npz")
train_DogResnet50 = bottleneck_features["train"]
valid_DogResnet50 = bottleneck_features["valid"]
test_DogResnet50 = bottleneck_features["test"]

###Define your architecture
Resnet50_model = tf.keras.Sequential()
Resnet50_model.add(tf.keras.layers.GlobalAveragePooling2D(input_shape=train_DogResnet50.shape[1:]))
Resnet50_model.add(tf.contrib.keras.layers.Dense(133, activation="softmax"))

Resnet50_model.summary()

###Compile the model
Resnet50_model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=["accuracy"])
###Train the model
checkpointer = tf.keras.callbacks.ModelCheckpoint(filepath="saved_models/weights.best.ResNet50.hdf5",
                                                 verbose=1, save_best_only=True)

Resnet50_model.fit(train_DogResnet50, train_targets,
                  validation_data=(valid_DogResnet50, valid_targets),
                  epochs=20, batch_size=20, callbacks=[checkpointer])

###Load the model weights with the best validation loss.
Resnet50_model.load_weights("saved_models/weights.best.ResNet50.hdf5")

###Calculate classification accuracy on the test dataset
Resnet50_predictions = [np.argmax(Resnet50_model.predict(np.expand_dims(feature, axis=0))) for feature in test_DogResnet50]

#Report test accuracy
test_accuracy = 100*np.sum(np.array(Resnet50_predictions)==np.argmax(test_targets, axis=1))/len(Resnet50_predictions)
print("Test accuracy: %.4f%%" % test_accuracy)

def extract_Resnet50(tensor):
    from keras.applications.resnet50 import ResNet50, preprocess_input
    return ResNet50(weights='imagenet', include_top=False).predict(preprocess_input(tensor))

def dog_breed(img_path):
    #extract bottleneck features
    bottleneck_feature = extract_Resnet50(path_to_tensor(img_path))
    #obtain predicted vector
    predicted_vector = Resnet50_model.predict(bottleneck_feature) #shape error occurs here
    #return dog breed that is predicted by the model
    return dog_names[np.argmax(predicted_vector)]

def dog_breed_predictor(img_path):
    #determine the predicted dog breed
    breed = dog_breed(img_path)
    #display the image
    img = cv2.imread(img_path)
    cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    plt.imshow(cv_rgb)
    plt.show()
    #display relevant predictor result
    if dog_detector(img_path):
        print("This is a dog and its breed is: " + str(breed))
    elif face_detector(img_path):
        print("This is a human but it looks like a: " + str(breed))
    else:
        print("I don't know what this is.")

dog_breed_predictor("dogImages/dogImages/train/016.Beagle/Beagle_01126.jpg")

我输入到函数中的图像来自用于训练模型的同一数据集-我想看看模型是否按预期工作，所以这个错误使它更加混乱。我可能做错了什么？

Answer 1

感谢nessuno的协助，我发现了问题所在。问题确实出在pooling的{{1}}层上。

上面我的脚本中的以下代码：

ResNet50

返回return ResNet50(weights='imagenet', include_top=False).predict(preprocess_input(tensor))的形状（不过，我不能完全理解 为什么 ）。为了解决这个问题，我这样添加了参数(1, 7, 7, 2048)：

pooling="avg"

这将返回return ResNet50(weights='imagenet', include_top=False, pooling="avg").predict(preprocess_input(tensor))的形状（同样，请承认，我不知道 为什么 。）

但是，模型仍希望使用4D形状。为了解决这个问题，我在(1, 2048)函数中添加了以下代码：

dog_breed()

，这将返回print(bottleneck_feature.shape) #returns (1, 2048) bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0) bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0) bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0) print(bottleneck_feature.shape) #returns (1, 1, 1, 1, 2048) - yes a 5D shape, not 4.的形状。出于某种原因，当我仅添加2个尺寸时，模型仍然抱怨它是3D形状，但是当我添加3rd形状时就停止了（这很奇怪，我想了解更多关于它的原因。）

总的来说，我的(1, 1, 1, 1, 2048)函数来自：

dog_breed()

对此：

def dog_breed(img_path):
    #extract bottleneck features
    bottleneck_feature = extract_Resnet50(path_to_tensor(img_path))
    #obtain predicted vector
    predicted_vector = Resnet50_model.predict(bottleneck_feature) #shape error occurs here
    #return dog breed that is predicted by the model
    return dog_names[np.argmax(predicted_vector)]

确保将参数def dog_breed(img_path): #extract bottleneck features bottleneck_feature = extract_Resnet50(path_to_tensor(img_path)) print(bottleneck_feature.shape) #returns (1, 2048) bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0) bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0) bottleneck_feature = np.expand_dims(bottleneck_feature, axis=0) print(bottleneck_feature.shape) #returns (1, 1, 1, 1, 2048) - yes a 5D shape, not 4. #obtain predicted vector predicted_vector = Resnet50_model.predict(bottleneck_feature) #shape error occurs here #return dog breed that is predicted by the model return dog_names[np.argmax(predicted_vector)]添加到我对pooling="avg"的调用中。

Answer 2

ResNet50的文档中介绍了有关构造函数参数input_shape的一些信息（重点是我的）：

input_shape：可选的形状元组，仅当include_top为False时才指定（否则，输入形状必须为（224、224、3）（使用“ channels_last”数据格式）或（3 ，224、224）（数据格式为“ channels_first”）。它应该恰好具有3个输入通道，并且宽度和高度不应小于197。例如，（200，200，3）是一个有效值。

我的猜测是，由于您将include_top指定为False，因此网络定义将输入填充为比224x224更大的形状，因此，提取特征时，最终得到的是特征图而不是特征向量（这就是导致错误的原因）。

只需尝试以这种方式指定和输入形状：

return ResNet50(weights='imagenet',
                include_top=False,
                input_shape=(224, 224, 3)).predict(preprocess_input(tensor))

TensorFlow / Keras-预期global_average_pooling2d_1_input的形状为（1，1，2048），但数组的形状为（7，7，2048）

2 个答案: