二进制分类模型的精度,召回率,f1得分和混淆矩阵非常差,表明该模型已经过良好训练

时间:2019-06-23 14:15:17

标签: machine-learning keras deep-learning classification conv-neural-network

问题: 我正在尝试为黑色素瘤“ MEL”和痣“ NV”建立二进制分类模型,该数据集来自ISIC存档ISIC 2019,但是对于8种不同类型的皮肤病变,我仅使用了两种(我提到过)二进制分类)。

这两个类别的“ NV”不平衡为10000,而“ MEL”为3000。

数据集分为训练和验证。

验证文件夹包含“ MEL”为904,“ NV”为2200

培训文件夹包含“ MEL”为3600和“ NV”为7100

我针对该问题执行了两种解决方案:对训练和验证集进行下采样,以及仅对训练集进行图像增强。

验证文件夹现在包含每个类的904张图像。

火车文件夹现在包含两个带有10000张图片的子文件夹“ MEL”和“ NV”

enter image description here

我使用keras对Densenet201进行了微调,并使用了针对密集网的预处理功能

从keras.applications.densenet导入DenseNet201,预处理输入

densenet_model = DenseNet201(input_shape=(224, 224, 3), include_top=False, weights="imagenet")

然后我在预训练模型的末尾添加一些图层

#get the last layer shape
last_layer = densenet_model.get_layer('relu')
print('last layer output shape:', last_layer.output_shape)
last_output = last_layer.output

# Flatten the output layer to 1 dimension
x = layers.GlobalMaxPooling2D()(last_output)
# Add a fully connected layer with 512 hidden units and ReLU activation
x = layers.Dense(512, activation='relu')(x)
# Add a dropout rate of 0.5
x = layers.Dropout(0.5)(x)
# Add a final sigmoid layer for classification
x = layers.Dense(2, activation='sigmoid')(x)
# Configure and compile the model
model = Model(densenet_model.input, x)

我没有冻结任何层,所以我编译了模型

optimizer = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, 

decay=0.0, amsgrad=True)
model.compile(loss='binary_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy','binary_accuracy'])

filepath = "densenet.h5"

# Declare a checkpoint to save the best version of the model
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1,
                             save_best_only=True, mode='max')

# Reduce the learning rate as the learning stagnates
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.5, patience=2,
                              verbose=1, mode='max', min_lr=0.00001)

callbacks_list = [checkpoint, reduce_lr]

# Fit the model
history = model.fit_generator(train_batches,
                              steps_per_epoch=train_steps,
                              validation_data=val_batches,
                              validation_steps=val_steps,
                              epochs=20,
                              verbose=1,
                              callbacks=callbacks_list)   

结果:

    Epoch 1/20
1701/1701 [==============================] - 793s 466ms/step - loss: 0.4436 - acc: 0.7890 - binary_accuracy: 0.7890 - val_loss: 0.3416 - val_acc: 0.8404 - val_binary_accuracy: 0.8404

Epoch 00001: val_acc improved from -inf to 0.84043, saving model to densenet.h5
Epoch 2/20
1701/1701 [==============================] - 720s 423ms/step - loss: 0.3447 - acc: 0.8450 - binary_accuracy: 0.8450 - val_loss: 0.3564 - val_acc: 0.8446 - val_binary_accuracy: 0.8446

Epoch 00002: val_acc improved from 0.84043 to 0.84458, saving model to densenet.h5
Epoch 3/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.2718 - acc: 0.8835 - binary_accuracy: 0.8835 - val_loss: 0.3785 - val_acc: 0.8487 - val_binary_accuracy: 0.8487

Epoch 00003: val_acc improved from 0.84458 to 0.84873, saving model to densenet.h5
Epoch 4/20
1701/1701 [==============================] - 726s 427ms/step - loss: 0.2051 - acc: 0.9172 - binary_accuracy: 0.9172 - val_loss: 0.3779 - val_acc: 0.8581 - val_binary_accuracy: 0.8581

Epoch 00004: val_acc improved from 0.84873 to 0.85813, saving model to densenet.h5
Epoch 5/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.1529 - acc: 0.9403 - binary_accuracy: 0.9403 - val_loss: 0.3923 - val_acc: 0.8581 - val_binary_accuracy: 0.8581

Epoch 00005: val_acc did not improve from 0.85813
Epoch 6/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.1163 - acc: 0.9553 - binary_accuracy: 0.9553 - val_loss: 0.4813 - val_acc: 0.8498 - val_binary_accuracy: 0.8498

Epoch 00006: val_acc did not improve from 0.85813

Epoch 00006: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05.
Epoch 7/20
1701/1701 [==============================] - 727s 427ms/step - loss: 0.0407 - acc: 0.9864 - binary_accuracy: 0.9864 - val_loss: 0.5726 - val_acc: 0.8667 - val_binary_accuracy: 0.8667

Epoch 00007: val_acc improved from 0.85813 to 0.86670, saving model to densenet.h5
Epoch 8/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.0287 - acc: 0.9904 - binary_accuracy: 0.9904 - val_loss: 0.5919 - val_acc: 0.8711 - val_binary_accuracy: 0.8711

Epoch 00008: val_acc improved from 0.86670 to 0.87113, saving model to densenet.h5
Epoch 9/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.0253 - acc: 0.9909 - binary_accuracy: 0.9909 - val_loss: 0.5453 - val_acc: 0.8720 - val_binary_accuracy: 0.8720

Epoch 00009: val_acc improved from 0.87113 to 0.87196, saving model to densenet.h5
Epoch 10/20
1701/1701 [==============================] - 730s 429ms/step - loss: 0.0216 - acc: 0.9927 - binary_accuracy: 0.9927 - val_loss: 0.5498 - val_acc: 0.8706 - val_binary_accuracy: 0.8706

Epoch 00010: val_acc did not improve from 0.87196
Epoch 11/20
1701/1701 [==============================] - 729s 428ms/step - loss: 0.0145 - acc: 0.9954 - binary_accuracy: 0.9954 - val_loss: 0.6332 - val_acc: 0.8822 - val_binary_accuracy: 0.8822

Epoch 00011: val_acc improved from 0.87196 to 0.88219, saving model to densenet.h5
Epoch 12/20
1701/1701 [==============================] - 731s 430ms/step - loss: 0.0176 - acc: 0.9939 - binary_accuracy: 0.9939 - val_loss: 0.6256 - val_acc: 0.8756 - val_binary_accuracy: 0.8756

Epoch 00012: val_acc did not improve from 0.88219
Epoch 13/20
1701/1701 [==============================] - 734s 432ms/step - loss: 0.0117 - acc: 0.9965 - binary_accuracy: 0.9965 - val_loss: 0.5959 - val_acc: 0.8838 - val_binary_accuracy: 0.8838

Epoch 00013: val_acc improved from 0.88219 to 0.88385, saving model to densenet.h5
Epoch 14/20
1701/1701 [==============================] - 736s 433ms/step - loss: 0.0132 - acc: 0.9958 - binary_accuracy: 0.9958 - val_loss: 0.7139 - val_acc: 0.8598 - val_binary_accuracy: 0.8598

Epoch 00014: val_acc did not improve from 0.88385
Epoch 15/20
1701/1701 [==============================] - 735s 432ms/step - loss: 0.0109 - acc: 0.9963 - binary_accuracy: 0.9963 - val_loss: 0.6139 - val_acc: 0.8720 - val_binary_accuracy: 0.8720

Epoch 00015: val_acc did not improve from 0.88385

Epoch 00015: ReduceLROnPlateau reducing learning rate to 2.499999936844688e-05.
Epoch 16/20
1701/1701 [==============================] - 734s 431ms/step - loss: 0.0048 - acc: 0.9980 - binary_accuracy: 0.9980 - val_loss: 0.6759 - val_acc: 0.8764 - val_binary_accuracy: 0.8764

Epoch 00016: val_acc did not improve from 0.88385
Epoch 17/20
1701/1701 [==============================] - 733s 431ms/step - loss: 0.0028 - acc: 0.9992 - binary_accuracy: 0.9992 - val_loss: 0.7179 - val_acc: 0.8805 - val_binary_accuracy: 0.8805

Epoch 00017: val_acc did not improve from 0.88385

Epoch 00017: ReduceLROnPlateau reducing learning rate to 1.249999968422344e-05.
Epoch 18/20
1701/1701 [==============================] - 734s 432ms/step - loss: 0.0014 - acc: 0.9996 - binary_accuracy: 0.9996 - val_loss: 0.7525 - val_acc: 0.8816 - val_binary_accuracy: 0.8816

Epoch 00018: val_acc did not improve from 0.88385
Epoch 19/20
1701/1701 [==============================] - 734s 431ms/step - loss: 0.0011 - acc: 0.9997 - binary_accuracy: 0.9997 - val_loss: 0.7580 - val_acc: 0.8803 - val_binary_accuracy: 0.8803

Epoch 00019: val_acc did not improve from 0.88385

Epoch 00019: ReduceLROnPlateau reducing learning rate to 1e-05.
Epoch 20/20
1701/1701 [==============================] - 733s 431ms/step - loss: 8.0918e-04 - acc: 0.9997 - binary_accuracy: 0.9997 - val_loss: 0.7667 - val_acc: 0.8800 - val_binary_accuracy: 0.8800

Epoch 00020: val_acc did not improve from 0.88385

混乱矩阵:

enter image description here

sklearn.metrics的分类报告

                    precision    recall  f1-score   support

 MEL                 0.48      0.47      0.48       904
 NV                  0.48      0.49      0.49       904

accuracy                                 0.48      1808
macro avg            0.48      0.48      0.48      1808
weighted avg         0.48      0.48      0.48      1808

困境

如您所见,这是一个完全垃圾模型。 您认为问题是什么?请提出任何建议,这种情况持续了一个月,没有任何改善。 如果您想看看其他代码,我会把它们发布出来。

这是扩充代码

    class_list = ['MEL','NV']

    for item in class_list:

    # Create a temporary directory for the augmented images
    aug_dir = 'aug_dir'
    os.mkdir(aug_dir)

    # Create a directory within the base dir to store images of the same class
    img_dir = os.path.join(aug_dir, 'img_dir')
    os.mkdir(img_dir)

    # Choose a class
    img_class = item

    # List all the images in the directory
    img_list = os.listdir('base_dir/train_dir/' + img_class)

    # Copy images from the class train dir to the img_dir
    for fname in img_list:
        # source path to image
        src = os.path.join('base_dir/train_dir/' + img_class, fname)
        # destination path to image
        dst = os.path.join(img_dir, fname)
        # copy the image from the source to the destination
        shutil.copyfile(src, dst)

    # point to a dir containing the images and not to the images themselves
    path = aug_dir
    save_path = 'base_dir/train_dir/' + img_class

    # Create a data generator to augment the images in real time
    datagen = ImageDataGenerator(
        rotation_range=60,
        width_shift_range=0.1,
        height_shift_range=0.1,
        #zoom_range=0.1,
        shear_range= 0.2,
        horizontal_flip=True,
        vertical_flip=True,
        brightness_range=(0.9,1.1),
        fill_mode='nearest')

    batch_size = 50

    aug_datagen = datagen.flow_from_directory(path,
                                              save_to_dir=save_path,
                                              save_format='jpg',
                                              target_size=(224, 224),
                                              batch_size=batch_size)

    # Generate the augmented images and add them to the training folders
    num_aug_images_wanted = 10000  # total number of images we want to have in each class
    num_files = len(os.listdir(img_dir))
    num_batches = int(np.ceil((num_aug_images_wanted - num_files) / batch_size))

    # run the generator and create about 6000 augmented images
    for i in range(0, num_batches):
        imgs, labels = next(aug_datagen)

    # delete temporary directory with the raw image files
    shutil.rmtree('aug_dir')

预处理代码

    # Declare a few useful values
num_train_samples = train_len
num_val_samples = val_len
train_batch_size = 16
val_batch_size = 100
image_height = 224
image_width = 224
# Declare how many steps are needed in an iteration
train_steps = np.ceil(num_train_samples / train_batch_size)
val_steps = np.ceil(num_val_samples / val_batch_size)
    # Set up generators

datagenr = ImageDataGenerator(
    preprocessing_function= \
    keras.applications.densenet.preprocess_input)


train_batches = datagenr.flow_from_directory(
    train_path,
    target_size=(image_height, image_width),
    batch_size=train_batch_size)

val_batches = datagenr.flow_from_directory(
    val_path,
    target_size=(image_height, image_width),
    batch_size=val_batch_size)

# Note: shuffle=False causes the test dataset to not be shuffled
test_batches = datagenr.flow_from_directory(
    val_path,
    target_size=(image_height, image_width),
    batch_size=val_batch_size,
    shuffle=False)

0 个答案:

没有答案