我有一组数据,这些数据是组织成文件夹的一堆图片:
/animals
/dogs
/cats
/snakes
/pandas
等,有10个不同的类别
我有一个名为trainingImages[]
的数组,其中包含我所有的预处理数据(灰度,32x32)
我有一个名为trainingLabels[]
的数组,其中包含所有标签,它们与trainingImages []匹配。所以trainingImages [1]是预处理的狗,trainingLabels [1]是字符串'dog'
然后我像这样使用了sklearns train_test_split()
:
(trainX, testX, trainY, testY) = train_test_split(trainingImages, trainingLabels, test_size=0.2, random_state=1)
此时trainX和trainY的形状分别为:(1095, 32, 32, 1) (1095, 20, 2)
我知道我现在必须将trainY
转换为单热矢量。我尝试过使用LabelBinarizer
和to_categorical
,但我仍有形状问题:
lb = LabelBinarizer().fit(trainY)
testY = lb.transform(testY)
trainY = lb.transform(trainY)
testY = keras.utils.to_categorical(testY)
trainY = keras.utils.to_categorical(trainY)
但是当我将其输入ValueError: Error when checking target: expected conv2d_1 to have 4 dimensions, but got array with shape (1095, 20, 2)
模型时,我收到Sequential
错误,该模型告诉我输入时形状是错误的。
我如何正确准备这些数据?
编辑:
代码:
inputWidth = 32
inputHeight = 32
inputDepth = 1
batchSize = 32
inputShape = (inputHeight, inputWidth, inputDepth)
trainDataGenerator = ImageDataGenerator(rescale=1. /255, shear_range=0.2,
zoom_range=0.2, horizontal_flip=True)
testDataGenerator = ImageDataGenerator(rescale=1. /255)
trainingSet = trainDataGenerator.flow_from_directory(args.dataset,
target_size=(32, 32), batch_size=batchSize, class_mode='categorical',
color_mode='grayscale')
testingSet = testDataGenerator.flow_from_directory(args.dataset,
target_size=(32, 32), batch_size=batchSize, class_mode='categorical',
color_mode='grayscale')
model = Sequential()
model.add(Conv2D(20, (5, 5), padding="same", input_shape=inputShape))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(50, (5, 5), padding="same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(500))
model.add(Activation("relu"))
model.add(Dense(trainingClasses))
model.add(Activation("softmax"))
opt = SGD(lr=0.01)
model.compile(loss="categorical_crossentropy", optimizer=opt,
metrics=["accuracy"])
model.fit_generator(trainingSet, steps_per_epoch=80, epochs=20,
validation_data=testingSet)
答案 0 :(得分:1)
您可以使用Keras ImageDataGenerator。生成器获取根目录中的所有文件夹,并为每个文件夹创建类别。每个分类文件夹中的所有文件都自动分配到其父文件夹的类别。然后可以在2分钟内手动完成测试和训练集之间的拆分。
batch_size = 32
# Noise data by zooming, rotating and flipping for more diverse training
train_datagen = ImageDataGenerator(rescale=1. / 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1. / 255)
training_set = train_datagen.flow_from_directory('folder/of/training/root/directory',
target_size=input_size,
batch_size=batch_size,
class_mode='categorical')
test_set = test_datagen.flow_from_directory('folder/of/test/root/directory',
target_size=input_size,
batch_size=batch_size,
class_mode='categorical')
# Train the CNN on catigories defined by the folder structure
classifier.fit_generator(training_set,
steps_per_epoch=8000/batch_size,
epochs=90,
validation_data=test_set,
validation_steps=2000/batch_size,
workers=12)
您可以通过发出以下内容获得一个热门编码的catigories:
print("The model class indices are:", training_set.class_indices)