我正在使用CNN预测医学图像的临床意义。我有196张图像的不平衡train_data集,其中有149个True和47个False。为了平衡数据集,我使用数据扩充对False数据进行过采样。结果数据集包括3817个样本,每个类别大约有50/50。但是,当我使用该数据集训练模型时,我总是获得约50%的准确性。 我还尝试对VGG19预训练模型(不包括顶层)使用相同的数据集,并且一切正常(80-90%的准确性)。我不知道从头开始构建模型会发生什么。
代码
from keras import models
from keras import layers
from keras.utils import np_utils, generic_utils
from keras.utils import to_categorical
from keras.preprocessing.image import ImageDataGenerator
from keras.applications import VGG19
from sklearn.model_selection import KFold
from keras.optimizers import adam
import pickle
import numpy as np
### CNN model and training ###
base_model = VGG19(weights=None, include_top=False, input_shape=(224,224,3))
# Define the K-fold Cross Validator
kfold = KFold(n_splits=5, shuffle=True)
acc_per_fold = [] # Define per-fold score containers
loss_per_fold = [] # Define per-fold score containers
fold_no = 1
for train, val in kfold.split(xtrain, ytrain):
X_train = xtrain[train]
X_val = xtrain[val]
X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
y_train = ytrain[train]
y_val = ytrain[val]
### Model architecture
model = models.Sequential()
model.add(base_model)
model.add(layers.Conv2D(64,(3,3), activation = 'relu', padding = 'same'))
model.add(layers.MaxPooling2D((2,2), strides=(2,2)))
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(2, activation='softmax'))
model.summary()
### Compile the model
opt = adam(lr=0.0001)
model.compile(loss='binary_crossentropy',
optimizer=opt,
metrics=['accuracy'])
print('------------------------------------------------------------------------')
print(f'Training for fold {fold_no} ...')
Epochs_no = 20
BS = 64
CB = CSVLogger('/headnode2/mngu6638/mproject/VGG19_model_save/VGG19B_Test1_Fold'+str(fold_no)+'_CB.csv',separator = ',', append=False)
# Train the model
history = model.fit(X_train, y_train, epochs=Epochs_no, batch_size=BS,validation_data=(X_val,y_val), callbacks = [CB], verbose=1)
我使用VGG19模型,增加了1个卷积+ MaxPooling层。权重设置为“无”,因此模型将从头开始训练。
结果为1个纪元,但20个纪元的准确度相似
64/3053 [..............................] - ETA: 50:49 - loss: 0.6931 - acc: 0.5156
128/3053 [>.............................] - ETA: 48:52 - loss: 0.6931 - acc: 0.5234
192/3053 [>.............................] - ETA: 47:20 - loss: 0.6930 - acc: 0.5312
256/3053 [=>............................] - ETA: 45:58 - loss: 0.6931 - acc: 0.5234
320/3053 [==>...........................] - ETA: 44:50 - loss: 0.6931 - acc: 0.5156
384/3053 [==>...........................] - ETA: 43:35 - loss: 0.6931 - acc: 0.4948
448/3053 [===>..........................] - ETA: 42:33 - loss: 0.6931 - acc: 0.4911
512/3053 [====>.........................] - ETA: 41:29 - loss: 0.6931 - acc: 0.4941
576/3053 [====>.........................] - ETA: 40:26 - loss: 0.6931 - acc: 0.4948
640/3053 [=====>........................] - ETA: 39:26 - loss: 0.6931 - acc: 0.4922
704/3053 [=====>........................] - ETA: 38:21 - loss: 0.6931 - acc: 0.4986
768/3053 [======>.......................] - ETA: 37:16 - loss: 0.6931 - acc: 0.5052
832/3053 [=======>......................] - ETA: 36:15 - loss: 0.6931 - acc: 0.5012
896/3053 [=======>......................] - ETA: 35:14 - loss: 0.6931 - acc: 0.5000
960/3053 [========>.....................] - ETA: 34:11 - loss: 0.6931 - acc: 0.4990
1024/3053 [=========>....................] - ETA: 33:09 - loss: 0.6931 - acc: 0.4990
.
.
.
2560/3053 [========================>.....] - ETA: 7:03 - loss: 0.6931 - acc: 0.4994
2624/3053 [========================>.....] - ETA: 6:09 - loss: 0.6931 - acc: 0.4998
2688/3053 [=========================>....] - ETA: 5:15 - loss: 0.6931 - acc: 0.4968
2752/3053 [==========================>...] - ETA: 4:20 - loss: 0.6931 - acc: 0.4976
2816/3053 [==========================>...] - ETA: 3:26 - loss: 0.6931 - acc: 0.4980
2880/3053 [===========================>..] - ETA: 2:30 - loss: 0.6931 - acc: 0.4964
2944/3053 [===========================>..] - ETA: 1:35 - loss: 0.6931 - acc: 0.4971
3008/3053 [============================>.] - ETA: 39s - loss: 0.6931 - acc: 0.4985
3053/3053 [==============================] - 2879s 943ms/step - loss: 0.6931 - acc: 0.5000 val_loss:0.6931 - val_acc: 0.5131
我第一次认为这可能是由于输入数据所致;但是,我使用预先训练的权重对模型使用了相同的数据集,并且一切正常。我还尝试将学习率从10e-4降低到10e-6,但在所有方面我都得到了相同的结果。 任何人都可以提出一些解决此问题的建议。非常感谢。