我正在建立一个神经网络来对图像上的电子邮件地址进行分类。肯定文件夹包含图像顶部写有电子邮件地址的图像,不同的字体,颜色,大小和位置。
负面文件夹包含顶部没有文字的图像,顶部的文字也没有电子邮件地址格式(没有@符号)。
图片为300 x 225 x 3(rgb)。
这应该是一个简单的简单分类任务(当有@时,NN应该能够获取,图像有电子邮件)但我的模型表现不佳。在25个时期之后,它的测试精度达到了83%。此外,它需要10个小时的训练时间,这对我来说太过分了。
你能帮助我分析CNN的结构并提出改进建议(或帮我避免陷阱)吗?
我写的模型是:
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.preprocessing.image import ImageDataGenerator
input_size = (64, 48)
# Initialising the CNN
classifier = Sequential()
# Step 1 - Convolution
classifier.add(Conv2D(32, (3, 3), input_shape = (*input_size, 3), activation = 'relu'))
# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Adding a second convolutional layer
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Step 3 - Flattening
classifier.add(Flatten())
# Step 4 - Full connection
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 1, activation = 'sigmoid'))
# Compiling the CNN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Part 2 - Fitting the CNN to the images
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory('./training_Set',
target_size = input_size,
batch_size = 32,
class_mode = 'binary')
test_set = test_datagen.flow_from_directory('./test_set',
target_size = input_size,
batch_size = 32,
class_mode = 'binary')
classifier.fit_generator(training_set,
steps_per_epoch = 8000,
epochs = 25,
validation_data = test_set,
validation_steps = 2000)