我试图通过使用预先训练的vgg16(在图像网络上训练)提取瓶颈特征来构建简单的5类对象检测器。我有10000张训练图像-每个课程2000张,测试2500张-每个课程500张。但是,一旦我提取了瓶颈特征,验证张量的大小为2496,但是预期大小应该为2500。我检查了文件夹数据文件夹,发现验证图像的总数为2500。但是我仍然得到尝试执行代码时发生错误。错误显示-“ ValueError:输入数组应具有与目标数组相同的样本数。找到2496个输入样本和2500个目标样本” 。我已经附上了下面的代码,有人可以让我理解为什么输入样本的数量减少到2496吗?
我刚刚检查了火车上的图像数量并测试了数据,以确保没有图像丢失。事实证明,实际上没有图像丢失。
这是获得瓶颈功能的代码。
global_start=dt.now()
#Dimensions of our flicker images is 256 X 256
img_width, img_height = 256, 256
#Declaration of parameters needed for training and validation
train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
epochs = 50
batch_size = 16
#Get the bottleneck features by Weights.T * Xi
def save_bottlebeck_features():
datagen = ImageDataGenerator(rescale=1./255)
#Load the pre trained VGG16 model from Keras, we will initialize only the convolution layers and ignore the top layers.
model = applications.VGG16(include_top=False, weights='imagenet')
generator_tr = datagen.flow_from_directory(train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None, #class_mode=None means the generator won't load the class labels.
shuffle=False) #We won't shuffle the data, because we want the class labels to stay in order.
nb_train_samples = len(generator_tr.filenames) #10000. 2000 training samples for each class
bottleneck_features_train = model.predict_generator(generator_tr, nb_train_samples // batch_size)
np.save('weights/vgg16bottleneck_features_train.npy',bottleneck_features_train) #bottleneck_features_train is a numpy array
generator_ts = datagen.flow_from_directory(validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
nb_validation_samples = len(generator_ts.filenames) #2500. 500 training samples for each class
bottleneck_features_validation = model.predict_generator(generator_ts, nb_validation_samples // batch_size)
np.save('weights/vgg16bottleneck_features_validation.npy',bottleneck_features_validation)
print("Got the bottleneck features in time: ",dt.now()-global_start)
num_classes = len(generator_tr.class_indices)
return nb_train_samples,nb_validation_samples,num_classes,generator_tr,generator_ts
nb_train_samples,nb_validation_samples,num_classes,generator_tr,generator_ts=save_bottlebeck_features()
这是上面代码片段的输出:
Found 10000 images belonging to 5 classes.
Found 2500 images belonging to 5 classes.
Got the bottleneck features in time: 1:56:44.166846
现在,如果我执行validation_data.shape
,我得到的是(2496,8,8,512),而预期的输出应该是(2500,8,8,512)。 train_data输出很好。可能是什么问题?我是Keras调试的新手,我真的无法弄清楚到底是什么引起了这个问题。
任何帮助将不胜感激!