ValueError:输入数组应具有与目标数组相同的样本数

时间:2018-01-31 12:19:10

标签: deep-learning keras classification

将Keras DL库与Tensorflow后端一起使用,我试图从VGG-16的中间层提取特征,以对我感兴趣的任务执行二进制分类。

数据集包含1000个(每个类500个)训练样本和200个(每个类100个)测试样本。我在提取的功能之上训练一个小的完全连接的模型。在运行代码时,我可以看到train_data的大小是(10000,6,6,512),validation_data是(2000,6,6,512)(float32),train_labels是(1000,2),validation_labels是(200,2) )(int32)。

以下是代码:

########################load libraries#########################################
import numpy as np
import time
from keras.layers import Dense, Dropout, Flatten
from keras.models import Model
from keras import applications
from keras.optimizers import SGD
from keras.preprocessing.image import ImageDataGenerator
#########################image characteristics#################################
img_rows=100 #dimensions of image, to be varied suiting the input requirements of the pre-trained model
img_cols=100
channel = 3 #RGB
num_classes = 2
batch_size = 10 
nb_epoch = 10
###############################################################################
''' This code uses VGG-16 as a feature extractor'''

feature_model = applications.VGG16(weights='imagenet', include_top=False, input_shape=(img_rows, img_cols, 3))
#get the model summary
feature_model.summary()
#extract feature from the intermediate layer
feature_model = Model(input=feature_model.input, output=feature_model.get_layer('block5_conv2').output) 
#get the model summary
feature_model.summary()

#declaring image data generators

train_datagen = ImageDataGenerator()
val_datagen = ImageDataGenerator()

generator = train_datagen.flow_from_directory(
        'f1_100/train',
        target_size=(img_rows, img_cols),
        batch_size=batch_size,
        class_mode=None,  
        shuffle=False)  

train_data = feature_model.predict_generator(generator, 1000)
train_labels = np.array([[1, 0]] * 500 + [[0, 1]] * 500)

generator = val_datagen.flow_from_directory(
        'f1_100/test',
        target_size=(img_rows, img_cols),
        batch_size=batch_size,
        class_mode=None,
        shuffle=False)

validation_data = feature_model.predict_generator(generator, 200)
validation_labels = np.array([[1,0]] * 100 + [[0,1]] * 100)

###############################################################################
#addding the top layers and training them on the extracted features
from keras.models import Sequential

model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))

sgd = SGD(lr=0.0001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd,
              loss='categorical_crossentropy',
              metrics=['accuracy'])
print('-'*30)
print('Start Training the top layers on the extracted features...')
print('-'*30)

#measure the time and train the model
t=time.time() 
hist = model.fit(train_data, train_labels, nb_epoch=nb_epoch, batch_size=batch_size,
                      validation_data=(validation_data, validation_labels),
                      verbose=2)
#print the history of the trained model
print(hist.history)
print('Training time: %s' % (time.time()-t))
###############################################################################

但是,在运行代码时,我收到以下错误:

Traceback (most recent call last):

  File "<ipython-input-14-cc5b1b34cc67>", line 46, in <module>
    verbose=2)

  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\models.py", line 960, in fit
    validation_steps=validation_steps)

  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 1581, in fit
    batch_size=batch_size)

  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 1426, in _standardize_user_data
    _check_array_lengths(x, y, sample_weights)

  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 250, in _check_array_lengths
    'and ' + str(list(set_y)[0]) + ' target samples.')

ValueError: Input arrays should have the same number of samples as target arrays. Found 10000 input samples and 1000 target samples.

1 个答案:

答案 0 :(得分:1)

你看,你的batch_size是10。

feature_model.predict_generator()使用steps param(在您的情况下为1000)总共10(batch_size)次。因此,总共产生了10000个训练样本。

但在下一行中,您宣称标签仅为1000.(500 1s和500 0s)。

所以你有两个选择:

1)更改steps中的predict_generator()就像这样(我相信你想要的,在列车中生成1000个样本,在验证中生成200个样本):

train_data = feature_model.predict_generator(generator, 100)
validation_data = feature_model.predict_generator(generator, 20)

2)或者您可以更改标签中的数字:

train_labels = np.array([[1, 0]] * 5000 + [[0, 1]] * 5000)
validation_labels = np.array([[1,0]] * 1000 + [[0,1]] * 1000)