Question

我正在使用CNN预测医学图像的临床意义。我有196张图像的不平衡train_data集，其中有149个True和47个False。为了平衡数据集，我使用数据扩充对False数据进行过采样。结果数据集包括3817个样本，每个类别大约有50/50。但是，当我使用该数据集训练模型时，我总是获得约50％的准确性。我还尝试对VGG19预训练模型（不包括顶层）使用相同的数据集，并且一切正常（80-90％的准确性）。我不知道从头开始构建模型会发生什么。

代码

from keras import models
from keras import layers
from keras.utils import np_utils, generic_utils
from keras.utils import to_categorical
from keras.preprocessing.image import ImageDataGenerator
from keras.applications import VGG19
from sklearn.model_selection import KFold
from keras.optimizers import adam
import pickle
import numpy as np

### CNN model and training ###
base_model = VGG19(weights=None, include_top=False, input_shape=(224,224,3))

# Define the K-fold Cross Validator
kfold = KFold(n_splits=5, shuffle=True)
acc_per_fold = []       # Define per-fold score containers
loss_per_fold = []      # Define per-fold score containers
fold_no = 1
for train, val in kfold.split(xtrain, ytrain):
    X_train = xtrain[train]
    X_val = xtrain[val]
    X_train = X_train.astype('float32')
    X_val = X_val.astype('float32')
    y_train = ytrain[train]
    y_val = ytrain[val]


 ### Model architecture
    model = models.Sequential()
    model.add(base_model)
    model.add(layers.Conv2D(64,(3,3), activation = 'relu', padding = 'same'))                                               
    model.add(layers.MaxPooling2D((2,2), strides=(2,2)))
    model.add(layers.Flatten())
    model.add(layers.Dense(256, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(256, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(2, activation='softmax'))
    model.summary()

    ### Compile the model
    opt = adam(lr=0.0001)
    model.compile(loss='binary_crossentropy',
                  optimizer=opt,
                  metrics=['accuracy'])
    print('------------------------------------------------------------------------')
    print(f'Training for fold {fold_no} ...')

    Epochs_no = 20
    BS = 64
    CB = CSVLogger('/headnode2/mngu6638/mproject/VGG19_model_save/VGG19B_Test1_Fold'+str(fold_no)+'_CB.csv',separator = ',', append=False)

    # Train the model
    history = model.fit(X_train, y_train, epochs=Epochs_no, batch_size=BS,validation_data=(X_val,y_val), callbacks = [CB], verbose=1)

我使用VGG19模型，增加了1个卷积+ MaxPooling层。权重设置为“无”，因此模型将从头开始训练。

结果为1个纪元，但20个纪元的准确度相似

64/3053 [..............................] - ETA: 50:49 - loss: 0.6931 - acc: 0.5156
128/3053 [>.............................] - ETA: 48:52 - loss: 0.6931 - acc: 0.5234
192/3053 [>.............................] - ETA: 47:20 - loss: 0.6930 - acc: 0.5312
256/3053 [=>............................] - ETA: 45:58 - loss: 0.6931 - acc: 0.5234
320/3053 [==>...........................] - ETA: 44:50 - loss: 0.6931 - acc: 0.5156
384/3053 [==>...........................] - ETA: 43:35 - loss: 0.6931 - acc: 0.4948
448/3053 [===>..........................] - ETA: 42:33 - loss: 0.6931 - acc: 0.4911
512/3053 [====>.........................] - ETA: 41:29 - loss: 0.6931 - acc: 0.4941
576/3053 [====>.........................] - ETA: 40:26 - loss: 0.6931 - acc: 0.4948
640/3053 [=====>........................] - ETA: 39:26 - loss: 0.6931 - acc: 0.4922
704/3053 [=====>........................] - ETA: 38:21 - loss: 0.6931 - acc: 0.4986
768/3053 [======>.......................] - ETA: 37:16 - loss: 0.6931 - acc: 0.5052
832/3053 [=======>......................] - ETA: 36:15 - loss: 0.6931 - acc: 0.5012
896/3053 [=======>......................] - ETA: 35:14 - loss: 0.6931 - acc: 0.5000
960/3053 [========>.....................] - ETA: 34:11 - loss: 0.6931 - acc: 0.4990
1024/3053 [=========>....................] - ETA: 33:09 - loss: 0.6931 - acc: 0.4990
.
.
.
2560/3053 [========================>.....] - ETA: 7:03 - loss: 0.6931 - acc: 0.4994
2624/3053 [========================>.....] - ETA: 6:09 - loss: 0.6931 - acc: 0.4998 
2688/3053 [=========================>....] - ETA: 5:15 - loss: 0.6931 - acc: 0.4968
2752/3053 [==========================>...] - ETA: 4:20 - loss: 0.6931 - acc: 0.4976
2816/3053 [==========================>...] - ETA: 3:26 - loss: 0.6931 - acc: 0.4980
2880/3053 [===========================>..] - ETA: 2:30 - loss: 0.6931 - acc: 0.4964
2944/3053 [===========================>..] - ETA: 1:35 - loss: 0.6931 - acc: 0.4971
3008/3053 [============================>.] - ETA: 39s - loss: 0.6931 - acc: 0.4985
3053/3053 [==============================] - 2879s 943ms/step - loss: 0.6931 - acc: 0.5000 val_loss:0.6931 - val_acc: 0.5131

我第一次认为这可能是由于输入数据所致；但是，我使用预先训练的权重对模型使用了相同的数据集，并且一切正常。我还尝试将学习率从10e-4降低到10e-6，但在所有方面我都得到了相同的结果。任何人都可以提出一些解决此问题的建议。非常感谢。

CNN-模型的准确度在50％左右波动

0 个答案: