将Keras DL库与Tensorflow后端一起使用,我试图从VGG-16的中间层提取特征,以对我感兴趣的任务执行二进制分类。
数据集包含1000个(每个类500个)训练样本和200个(每个类100个)测试样本。我在提取的功能之上训练一个小的完全连接的模型。在运行代码时,我可以看到train_data的大小是(10000,6,6,512),validation_data是(2000,6,6,512)(float32),train_labels是(1000,2),validation_labels是(200,2) )(int32)。
以下是代码:
########################load libraries#########################################
import numpy as np
import time
from keras.layers import Dense, Dropout, Flatten
from keras.models import Model
from keras import applications
from keras.optimizers import SGD
from keras.preprocessing.image import ImageDataGenerator
#########################image characteristics#################################
img_rows=100 #dimensions of image, to be varied suiting the input requirements of the pre-trained model
img_cols=100
channel = 3 #RGB
num_classes = 2
batch_size = 10
nb_epoch = 10
###############################################################################
''' This code uses VGG-16 as a feature extractor'''
feature_model = applications.VGG16(weights='imagenet', include_top=False, input_shape=(img_rows, img_cols, 3))
#get the model summary
feature_model.summary()
#extract feature from the intermediate layer
feature_model = Model(input=feature_model.input, output=feature_model.get_layer('block5_conv2').output)
#get the model summary
feature_model.summary()
#declaring image data generators
train_datagen = ImageDataGenerator()
val_datagen = ImageDataGenerator()
generator = train_datagen.flow_from_directory(
'f1_100/train',
target_size=(img_rows, img_cols),
batch_size=batch_size,
class_mode=None,
shuffle=False)
train_data = feature_model.predict_generator(generator, 1000)
train_labels = np.array([[1, 0]] * 500 + [[0, 1]] * 500)
generator = val_datagen.flow_from_directory(
'f1_100/test',
target_size=(img_rows, img_cols),
batch_size=batch_size,
class_mode=None,
shuffle=False)
validation_data = feature_model.predict_generator(generator, 200)
validation_labels = np.array([[1,0]] * 100 + [[0,1]] * 100)
###############################################################################
#addding the top layers and training them on the extracted features
from keras.models import Sequential
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
sgd = SGD(lr=0.0001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd,
loss='categorical_crossentropy',
metrics=['accuracy'])
print('-'*30)
print('Start Training the top layers on the extracted features...')
print('-'*30)
#measure the time and train the model
t=time.time()
hist = model.fit(train_data, train_labels, nb_epoch=nb_epoch, batch_size=batch_size,
validation_data=(validation_data, validation_labels),
verbose=2)
#print the history of the trained model
print(hist.history)
print('Training time: %s' % (time.time()-t))
###############################################################################
但是,在运行代码时,我收到以下错误:
Traceback (most recent call last):
File "<ipython-input-14-cc5b1b34cc67>", line 46, in <module>
verbose=2)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\models.py", line 960, in fit
validation_steps=validation_steps)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 1581, in fit
batch_size=batch_size)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 1426, in _standardize_user_data
_check_array_lengths(x, y, sample_weights)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 250, in _check_array_lengths
'and ' + str(list(set_y)[0]) + ' target samples.')
ValueError: Input arrays should have the same number of samples as target arrays. Found 10000 input samples and 1000 target samples.
答案 0 :(得分:1)
你看,你的batch_size是10。
feature_model.predict_generator()
使用steps
param(在您的情况下为1000)总共10(batch_size)次。因此,总共产生了10000个训练样本。
但在下一行中,您宣称标签仅为1000.(500 1s和500 0s)。
所以你有两个选择:
1)更改steps
中的predict_generator()
就像这样(我相信你想要的,在列车中生成1000个样本,在验证中生成200个样本):
train_data = feature_model.predict_generator(generator, 100)
validation_data = feature_model.predict_generator(generator, 20)
2)或者您可以更改标签中的数字:
train_labels = np.array([[1, 0]] * 5000 + [[0, 1]] * 5000)
validation_labels = np.array([[1,0]] * 1000 + [[0,1]] * 1000)