Question

我试图通过使用预先训练的vgg16（在图像网络上训练）提取瓶颈特征来构建简单的5类对象检测器。我有10000张训练图像-每个课程2000张，测试2500张-每个课程500张。但是，一旦我提取了瓶颈特征，验证张量的大小为2496，但是预期大小应该为2500。我检查了文件夹数据文件夹，发现验证图像的总数为2500。但是我仍然得到尝试执行代码时发生错误。错误显示-“ ValueError：输入数组应具有与目标数组相同的样本数。找到2496个输入样本和2500个目标样本” 。我已经附上了下面的代码，有人可以让我理解为什么输入样本的数量减少到2496吗？

我刚刚检查了火车上的图像数量并测试了数据，以确保没有图像丢失。事实证明，实际上没有图像丢失。

这是获得瓶颈功能的代码。

global_start=dt.now()

#Dimensions of our flicker images is 256 X 256
img_width, img_height = 256, 256

#Declaration of parameters needed for training and validation
train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
epochs = 50
batch_size = 16

#Get the bottleneck features by  Weights.T * Xi
def save_bottlebeck_features():
    datagen = ImageDataGenerator(rescale=1./255)

    #Load the pre trained VGG16 model from Keras, we will initialize only the convolution layers and ignore the top layers.
    model = applications.VGG16(include_top=False, weights='imagenet')

    generator_tr = datagen.flow_from_directory(train_data_dir,
                                            target_size=(img_width, img_height),
                                            batch_size=batch_size,
                                            class_mode=None, #class_mode=None means the generator won't load the class labels.
                                            shuffle=False) #We won't shuffle the data, because we want the class labels to stay in order.
    nb_train_samples = len(generator_tr.filenames) #10000. 2000 training samples for each class
    bottleneck_features_train = model.predict_generator(generator_tr, nb_train_samples // batch_size)
    np.save('weights/vgg16bottleneck_features_train.npy',bottleneck_features_train) #bottleneck_features_train is a numpy array

    generator_ts = datagen.flow_from_directory(validation_data_dir,
                                            target_size=(img_width, img_height),
                                            batch_size=batch_size,
                                            class_mode=None,
                                            shuffle=False)
    nb_validation_samples = len(generator_ts.filenames) #2500. 500 training samples for each class
    bottleneck_features_validation = model.predict_generator(generator_ts, nb_validation_samples // batch_size)
    np.save('weights/vgg16bottleneck_features_validation.npy',bottleneck_features_validation)
    print("Got the bottleneck features in time: ",dt.now()-global_start)

    num_classes = len(generator_tr.class_indices)

    return nb_train_samples,nb_validation_samples,num_classes,generator_tr,generator_ts

nb_train_samples,nb_validation_samples,num_classes,generator_tr,generator_ts=save_bottlebeck_features()

这是上面代码片段的输出：

Found 10000 images belonging to 5 classes.
Found 2500 images belonging to 5 classes.
Got the bottleneck features in time:  1:56:44.166846

现在，如果我执行validation_data.shape，我得到的是（2496，8，8，512），而预期的输出应该是（2500，8，8，512）。 train_data输出很好。可能是什么问题？我是Keras调试的新手，我真的无法弄清楚到底是什么引起了这个问题。

任何帮助将不胜感激！

为什么在获得瓶颈功能后图像数量减少了？

0 个答案: