Question

我知道，在CNN模型中添加辍学层可以提高准确性，因为它可以减少过度拟合的影响。但是，我建立了一个具有16,32和64个过滤器，大小为3且maxpool为2的CNN模型，并注意到在所有情况下，没有脱出层的模型的性能都比有脱出层的模型更好。

from keras.models import Sequential
from keras.layers import Conv2D,Activation,MaxPooling2D,Dense,Flatten,Dropout
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from IPython.display import display
import matplotlib.pyplot as plt
from PIL import Image
from sklearn.metrics import classification_report, confusion_matrix
import keras
from keras.layers import BatchNormalization
from keras.optimizers import Adam
import pickle

classifier = Sequential()
classifier.add(Conv2D(16,(3,3),input_shape=(200,200,3)))
classifier.add(Activation('relu'))
classifier.add(MaxPooling2D(pool_size =(2,2)))
classifier.add(Flatten())
classifier.add(Dense(128))
classifier.add(Activation('relu'))
classifier.add(Dropout(0.5))
classifier.add(Dense(7))
classifier.add(Activation('softmax'))
classifier.summary()
classifier.compile(optimizer =keras.optimizers.Adam(lr=0.001),
                   loss ='categorical_crossentropy',
                   metrics =['accuracy'])
train_datagen = ImageDataGenerator(rescale =1./255,
                                   shear_range =0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip =True)
test_datagen = ImageDataGenerator(rescale = 1./255)

batchsize=10
training_set = train_datagen.flow_from_directory('/home/osboxes/Downloads/Downloads/Journal_Paper/Malware_Families/Spectrogram/Train/',            
                                                target_size=(200,200),
                                                batch_size= batchsize,
                                                class_mode='categorical')

test_set = test_datagen.flow_from_directory('/home/osboxes/Downloads/Downloads/Journal_Paper/Malware_Families/Spectrogram/Validate/',    
                                           target_size = (200,200),
                                           batch_size = batchsize,
                       shuffle=False,
                                           class_mode ='categorical')
history=classifier.fit_generator(training_set,
                        steps_per_epoch = 2340 // batchsize,
                        epochs = 100,
                        validation_data =test_set,
                        validation_steps = 781 // batchsize)

classifier.save('16_With_Dropout_rl_001.h5')
with open('16_With_Dropout_rl_001.h5', 'wb') as file_pi:
        pickle.dump(history.history, file_pi)
Y_pred = classifier.predict_generator(test_set, steps= 781 // batchsize+1)
y_pred = np.argmax(Y_pred, axis=1)
print('Confusion Matrix')
print(confusion_matrix(test_set.classes, y_pred))
print('Classification Report')
target_names = test_set.classes
class_labels = list(test_set.class_indices.keys()) 
target_names = ['coinhive','emotet','fareit','gafgyt','mirai','ramnit','razy']  
report = classification_report(test_set.classes, y_pred, target_names=class_labels)
print(report) 

# summarize history for accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy 16 with dropout rl .001')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss 16 with dropout rl .001')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

Answer 1

我知道在CNN模型中添加辍学层可以提高准确性，因为它减少了过拟合的影响。

您可以这样说，但一般来说并不适用。辍学层是一种通用化技术，可降低模型的灵活性，可以在模型足够灵活以处理任务的情况下（实际上，假设模型比所需的灵活性更大）可以防止过度拟合。如果您的模型无法处理开始的任务，这意味着它太弱了，那么添加任何形式的正则化可能只会使其性能下降。

话虽这么说，但当您包含多个卷积层时，CNN通常会表现更好。这个想法是，更深的卷积层学习更复杂的功能，而靠近输入的层仅学习基本形状（当然，这取决于网络本身的结构和任务的复杂性）。而且，由于您通常希望包括更多的卷积层，因此此类模型的复杂性（和灵活性）会提高，这可能导致过度拟合，因此需要正则化技术。（具有正则化的3个卷积层通常会胜过没有正则化的1个卷积层。）

您的设计仅包含一个卷积层。我建议在彼此之上堆叠多个卷积/池化层，并在必要时添加一些辍学层以对抗过度拟合（在这样一个简单的模型上可能很难看到正则化的任何积极影响）。

Answer 2

我同意@Matus Dubrava所说的一切，但也建议您尝试将辍学率远低于0.5。通常，人们使用介于0.15和0.3之间的值。我通常使用0.2。尝试几个不同的值，看看哪种方法最有效。并且，如Matus建议的那样，尝试更多的卷积层。在表格和图像生成模型中使用三种CN架构，我获得了很多成功。

辍学层将提高准确性

2 个答案: