目标:对细胞是寄生虫(疟疾)还是未感染
进行分类数据集来自Kaggle:https://www.kaggle.com/iarunava/cell-images-for-detecting-malaria
进口:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
from sklearn.metrics import classification_report, confusion_matrix
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, MaxPool2D, Conv2D, Flatten
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from matplotlib.image import imread
路径:
file_path = "/whatever-you-store-the-data/cell_images/"
test_path = "/whatever-you-store-the-data/cell_images/test/"
train_path = "/whatever-you-store-the-data/cell_images/train/"
图像的平均大小为(130、130、3)#(宽度,高度,colour_channels):
image_shape = (130, 130, 3)
ImageDataGenerator:
image_gen = ImageDataGenerator(rotation_range=20,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.1,
zoom_range=0.1,
horizontal_flip=True,
vertical_flip=True,
fill_mode="nearest")
模型:
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=image_shape, activation="relu"))
model.add(MaxPool2D((2, 2)))
model.add(Dropout(0.5))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPool2D((2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPool2D((2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss="binary_crossentropy",
optimizer="adam",
metrics=["accuracy"])
早期停止回调:
early_stop = EarlyStopping(monitor="val_loss",
patience=5,
verbose=1,
mode="min")
发电机:
train_image_gen = image_gen.flow_from_directory(train_path,
target_size=image_shape[:2],
color_mode="rgb",
batch_size=32,
class_mode="binary")
test_image_gen = image_gen.flow_from_directory(test_path,
target_size=image_shape[:2],
color_mode="rgb",
batch_size=32,
class_mode="binary",
shuffle=False)
拟合模型:
results = model.fit_generator(train_image_gen,
epochs=20,
validation_data=test_image_gen,
callbacks=[early_stop])
以下是输出:
Epoch 1/20
390/Unknown - 9339s 24s/step - loss: 4.4232 - accuracy: 0.5135
首先,为什么要采用n / Unknown形式,更重要的是,为什么要花费9339s。那不是问题,问题在于为什么估计的培训时间一直在增加,它从大约240s开始,然后随着时间增加,直到最终达到9339s。这里发生了什么,我该如何解决?