问题: 我正在尝试为黑色素瘤“ MEL”和痣“ NV”建立二进制分类模型,该数据集来自ISIC存档ISIC 2019,但是对于8种不同类型的皮肤病变,我仅使用了两种(我提到过)二进制分类)。
这两个类别的“ NV”不平衡为10000,而“ MEL”为3000。
数据集分为训练和验证。
验证文件夹包含“ MEL”为904,“ NV”为2200
培训文件夹包含“ MEL”为3600和“ NV”为7100
我针对该问题执行了两种解决方案:对训练和验证集进行下采样,以及仅对训练集进行图像增强。
验证文件夹现在包含每个类的904张图像。
火车文件夹现在包含两个带有10000张图片的子文件夹“ MEL”和“ NV”
我使用keras对Densenet201进行了微调,并使用了针对密集网的预处理功能
从keras.applications.densenet导入DenseNet201,预处理输入
densenet_model = DenseNet201(input_shape=(224, 224, 3), include_top=False, weights="imagenet")
然后我在预训练模型的末尾添加一些图层
#get the last layer shape
last_layer = densenet_model.get_layer('relu')
print('last layer output shape:', last_layer.output_shape)
last_output = last_layer.output
# Flatten the output layer to 1 dimension
x = layers.GlobalMaxPooling2D()(last_output)
# Add a fully connected layer with 512 hidden units and ReLU activation
x = layers.Dense(512, activation='relu')(x)
# Add a dropout rate of 0.5
x = layers.Dropout(0.5)(x)
# Add a final sigmoid layer for classification
x = layers.Dense(2, activation='sigmoid')(x)
# Configure and compile the model
model = Model(densenet_model.input, x)
我没有冻结任何层,所以我编译了模型
optimizer = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None,
decay=0.0, amsgrad=True)
model.compile(loss='binary_crossentropy',
optimizer=optimizer,
metrics=['accuracy','binary_accuracy'])
filepath = "densenet.h5"
# Declare a checkpoint to save the best version of the model
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1,
save_best_only=True, mode='max')
# Reduce the learning rate as the learning stagnates
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.5, patience=2,
verbose=1, mode='max', min_lr=0.00001)
callbacks_list = [checkpoint, reduce_lr]
# Fit the model
history = model.fit_generator(train_batches,
steps_per_epoch=train_steps,
validation_data=val_batches,
validation_steps=val_steps,
epochs=20,
verbose=1,
callbacks=callbacks_list)
Epoch 1/20
1701/1701 [==============================] - 793s 466ms/step - loss: 0.4436 - acc: 0.7890 - binary_accuracy: 0.7890 - val_loss: 0.3416 - val_acc: 0.8404 - val_binary_accuracy: 0.8404
Epoch 00001: val_acc improved from -inf to 0.84043, saving model to densenet.h5
Epoch 2/20
1701/1701 [==============================] - 720s 423ms/step - loss: 0.3447 - acc: 0.8450 - binary_accuracy: 0.8450 - val_loss: 0.3564 - val_acc: 0.8446 - val_binary_accuracy: 0.8446
Epoch 00002: val_acc improved from 0.84043 to 0.84458, saving model to densenet.h5
Epoch 3/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.2718 - acc: 0.8835 - binary_accuracy: 0.8835 - val_loss: 0.3785 - val_acc: 0.8487 - val_binary_accuracy: 0.8487
Epoch 00003: val_acc improved from 0.84458 to 0.84873, saving model to densenet.h5
Epoch 4/20
1701/1701 [==============================] - 726s 427ms/step - loss: 0.2051 - acc: 0.9172 - binary_accuracy: 0.9172 - val_loss: 0.3779 - val_acc: 0.8581 - val_binary_accuracy: 0.8581
Epoch 00004: val_acc improved from 0.84873 to 0.85813, saving model to densenet.h5
Epoch 5/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.1529 - acc: 0.9403 - binary_accuracy: 0.9403 - val_loss: 0.3923 - val_acc: 0.8581 - val_binary_accuracy: 0.8581
Epoch 00005: val_acc did not improve from 0.85813
Epoch 6/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.1163 - acc: 0.9553 - binary_accuracy: 0.9553 - val_loss: 0.4813 - val_acc: 0.8498 - val_binary_accuracy: 0.8498
Epoch 00006: val_acc did not improve from 0.85813
Epoch 00006: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05.
Epoch 7/20
1701/1701 [==============================] - 727s 427ms/step - loss: 0.0407 - acc: 0.9864 - binary_accuracy: 0.9864 - val_loss: 0.5726 - val_acc: 0.8667 - val_binary_accuracy: 0.8667
Epoch 00007: val_acc improved from 0.85813 to 0.86670, saving model to densenet.h5
Epoch 8/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.0287 - acc: 0.9904 - binary_accuracy: 0.9904 - val_loss: 0.5919 - val_acc: 0.8711 - val_binary_accuracy: 0.8711
Epoch 00008: val_acc improved from 0.86670 to 0.87113, saving model to densenet.h5
Epoch 9/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.0253 - acc: 0.9909 - binary_accuracy: 0.9909 - val_loss: 0.5453 - val_acc: 0.8720 - val_binary_accuracy: 0.8720
Epoch 00009: val_acc improved from 0.87113 to 0.87196, saving model to densenet.h5
Epoch 10/20
1701/1701 [==============================] - 730s 429ms/step - loss: 0.0216 - acc: 0.9927 - binary_accuracy: 0.9927 - val_loss: 0.5498 - val_acc: 0.8706 - val_binary_accuracy: 0.8706
Epoch 00010: val_acc did not improve from 0.87196
Epoch 11/20
1701/1701 [==============================] - 729s 428ms/step - loss: 0.0145 - acc: 0.9954 - binary_accuracy: 0.9954 - val_loss: 0.6332 - val_acc: 0.8822 - val_binary_accuracy: 0.8822
Epoch 00011: val_acc improved from 0.87196 to 0.88219, saving model to densenet.h5
Epoch 12/20
1701/1701 [==============================] - 731s 430ms/step - loss: 0.0176 - acc: 0.9939 - binary_accuracy: 0.9939 - val_loss: 0.6256 - val_acc: 0.8756 - val_binary_accuracy: 0.8756
Epoch 00012: val_acc did not improve from 0.88219
Epoch 13/20
1701/1701 [==============================] - 734s 432ms/step - loss: 0.0117 - acc: 0.9965 - binary_accuracy: 0.9965 - val_loss: 0.5959 - val_acc: 0.8838 - val_binary_accuracy: 0.8838
Epoch 00013: val_acc improved from 0.88219 to 0.88385, saving model to densenet.h5
Epoch 14/20
1701/1701 [==============================] - 736s 433ms/step - loss: 0.0132 - acc: 0.9958 - binary_accuracy: 0.9958 - val_loss: 0.7139 - val_acc: 0.8598 - val_binary_accuracy: 0.8598
Epoch 00014: val_acc did not improve from 0.88385
Epoch 15/20
1701/1701 [==============================] - 735s 432ms/step - loss: 0.0109 - acc: 0.9963 - binary_accuracy: 0.9963 - val_loss: 0.6139 - val_acc: 0.8720 - val_binary_accuracy: 0.8720
Epoch 00015: val_acc did not improve from 0.88385
Epoch 00015: ReduceLROnPlateau reducing learning rate to 2.499999936844688e-05.
Epoch 16/20
1701/1701 [==============================] - 734s 431ms/step - loss: 0.0048 - acc: 0.9980 - binary_accuracy: 0.9980 - val_loss: 0.6759 - val_acc: 0.8764 - val_binary_accuracy: 0.8764
Epoch 00016: val_acc did not improve from 0.88385
Epoch 17/20
1701/1701 [==============================] - 733s 431ms/step - loss: 0.0028 - acc: 0.9992 - binary_accuracy: 0.9992 - val_loss: 0.7179 - val_acc: 0.8805 - val_binary_accuracy: 0.8805
Epoch 00017: val_acc did not improve from 0.88385
Epoch 00017: ReduceLROnPlateau reducing learning rate to 1.249999968422344e-05.
Epoch 18/20
1701/1701 [==============================] - 734s 432ms/step - loss: 0.0014 - acc: 0.9996 - binary_accuracy: 0.9996 - val_loss: 0.7525 - val_acc: 0.8816 - val_binary_accuracy: 0.8816
Epoch 00018: val_acc did not improve from 0.88385
Epoch 19/20
1701/1701 [==============================] - 734s 431ms/step - loss: 0.0011 - acc: 0.9997 - binary_accuracy: 0.9997 - val_loss: 0.7580 - val_acc: 0.8803 - val_binary_accuracy: 0.8803
Epoch 00019: val_acc did not improve from 0.88385
Epoch 00019: ReduceLROnPlateau reducing learning rate to 1e-05.
Epoch 20/20
1701/1701 [==============================] - 733s 431ms/step - loss: 8.0918e-04 - acc: 0.9997 - binary_accuracy: 0.9997 - val_loss: 0.7667 - val_acc: 0.8800 - val_binary_accuracy: 0.8800
Epoch 00020: val_acc did not improve from 0.88385
混乱矩阵:
sklearn.metrics的分类报告
precision recall f1-score support
MEL 0.48 0.47 0.48 904
NV 0.48 0.49 0.49 904
accuracy 0.48 1808
macro avg 0.48 0.48 0.48 1808
weighted avg 0.48 0.48 0.48 1808
如您所见,这是一个完全垃圾模型。 您认为问题是什么?请提出任何建议,这种情况持续了一个月,没有任何改善。 如果您想看看其他代码,我会把它们发布出来。
这是扩充代码
class_list = ['MEL','NV']
for item in class_list:
# Create a temporary directory for the augmented images
aug_dir = 'aug_dir'
os.mkdir(aug_dir)
# Create a directory within the base dir to store images of the same class
img_dir = os.path.join(aug_dir, 'img_dir')
os.mkdir(img_dir)
# Choose a class
img_class = item
# List all the images in the directory
img_list = os.listdir('base_dir/train_dir/' + img_class)
# Copy images from the class train dir to the img_dir
for fname in img_list:
# source path to image
src = os.path.join('base_dir/train_dir/' + img_class, fname)
# destination path to image
dst = os.path.join(img_dir, fname)
# copy the image from the source to the destination
shutil.copyfile(src, dst)
# point to a dir containing the images and not to the images themselves
path = aug_dir
save_path = 'base_dir/train_dir/' + img_class
# Create a data generator to augment the images in real time
datagen = ImageDataGenerator(
rotation_range=60,
width_shift_range=0.1,
height_shift_range=0.1,
#zoom_range=0.1,
shear_range= 0.2,
horizontal_flip=True,
vertical_flip=True,
brightness_range=(0.9,1.1),
fill_mode='nearest')
batch_size = 50
aug_datagen = datagen.flow_from_directory(path,
save_to_dir=save_path,
save_format='jpg',
target_size=(224, 224),
batch_size=batch_size)
# Generate the augmented images and add them to the training folders
num_aug_images_wanted = 10000 # total number of images we want to have in each class
num_files = len(os.listdir(img_dir))
num_batches = int(np.ceil((num_aug_images_wanted - num_files) / batch_size))
# run the generator and create about 6000 augmented images
for i in range(0, num_batches):
imgs, labels = next(aug_datagen)
# delete temporary directory with the raw image files
shutil.rmtree('aug_dir')
预处理代码
# Declare a few useful values
num_train_samples = train_len
num_val_samples = val_len
train_batch_size = 16
val_batch_size = 100
image_height = 224
image_width = 224
# Declare how many steps are needed in an iteration
train_steps = np.ceil(num_train_samples / train_batch_size)
val_steps = np.ceil(num_val_samples / val_batch_size)
# Set up generators
datagenr = ImageDataGenerator(
preprocessing_function= \
keras.applications.densenet.preprocess_input)
train_batches = datagenr.flow_from_directory(
train_path,
target_size=(image_height, image_width),
batch_size=train_batch_size)
val_batches = datagenr.flow_from_directory(
val_path,
target_size=(image_height, image_width),
batch_size=val_batch_size)
# Note: shuffle=False causes the test dataset to not be shuffled
test_batches = datagenr.flow_from_directory(
val_path,
target_size=(image_height, image_width),
batch_size=val_batch_size,
shuffle=False)