我目前正在使用Keras进行卫星图像分类,我很难通过预测和预测生成器获得正确的预测。
在我的代码下面
import os
import numpy as np
import pandas as pd
from keras.optimizers import Adam, SGD
from tools import load_val_datas, load_test_datas, make_predictions, make_submissions
from keras_tools import save_model, load_model
from callbacks import CustomCallbacks
from data_generator import ImageDataGenerator
from model import base_cnn
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
TRAIN_SIZE, VAL_SIZE, TEST_SIZE, TEST_SIZE_ADD = 30000, 10479, 40669, 20522
IMAGE_FIRST_DIM, N_COLORS = 32, 3
IMAGE_SIZE = IMAGE_FIRST_DIM * IMAGE_FIRST_DIM * N_COLORS
LABEL_SIZE = 17
DROPOUT = 0.25
BATCH_SIZE = 96
N_EPOCHS = 2
CHECKPOINTS_FOLDER = "checkpoints/"
MODEL_JSON = "epoch-10.json"
MODEL_H5 = "epoch-10.h5"
TO_LOAD = False
df_train_labels = pd.read_csv("datas/train_labels.csv")
label_dict = df_train_labels.set_index("image_name").T.to_dict("list")
val_x, val_y = load_val_datas(VAL_SIZE, IMAGE_FIRST_DIM, N_COLORS, LABEL_SIZE)
if TO_LOAD:
model, is_loaded = load_model(CHECKPOINTS_FOLDER + MODEL_JSON, CHECKPOINTS_FOLDER + MODEL_H5)
else:
model = base_cnn(IMAGE_FIRST_DIM, N_COLORS)
adam = Adam(lr=0.01)
sgd = SGD(lr=0.01, momentum=0.9, decay=0.0005)
model.compile(loss='binary_crossentropy', optimizer=sgd)
my_callbacks = CustomCallbacks()
datagen = ImageDataGenerator(rescale=1./255)
train_generator = datagen.flow_from_directory("datas/train", target_size=(IMAGE_FIRST_DIM, IMAGE_FIRST_DIM),
batch_size=BATCH_SIZE,
class_mode="multilabel", multilabel_classes=label_dict)
val_generator = datagen.flow_from_directory("datas/validation", target_size=(IMAGE_FIRST_DIM, IMAGE_FIRST_DIM),
batch_size=BATCH_SIZE, shuffle=False,
class_mode="multilabel", multilabel_classes=label_dict)
model.fit_generator(train_generator, steps_per_epoch=TRAIN_SIZE/BATCH_SIZE, epochs=N_EPOCHS,
verbose=2)
save_model(model, MODEL_JSON, MODEL_H5)
from time import time
st = time()
p_valid = model.predict_generator(val_generator, steps=VAL_SIZE/BATCH_SIZE, pickle_safe=True)
print("time: ", time() - st)
print(p_valid)
from sklearn.metrics import fbeta_score
print(fbeta_score(val_y, np.array(p_valid) > 0.2, beta=2, average='samples'))
st = time()
p_valid1 = model.predict(val_x)
print("time: ", time() - st)
print(type(p_valid1))
print(fbeta_score(val_y, np.array(p_valid1) > 0.2, beta=2, average='samples'))
我正在使用可以处理多标签的不同版本的ImageDataGenerator(我已经检查了实现,并且数据看起来是批量正确加载的)
麻烦来自于预测和预测生成器部分,我得到两者不同的结果。我使用没有生成器训练的模型进行了双重检查,并且预测的输出是正确的(并且与predict_generator的输出非常不同)。预测中的数据构建方式与发生器的方式相同(也检查了)。
Using TensorFlow backend.
validation images loaded in 0.01 seconds
validation labels loaded in 0.00 seconds
Found 30000 images belonging to 1 classes.
Found 10479 images belonging to 1 classes.
Epoch 1/2
63s - loss: 0.2762
Epoch 2/2
66s - loss: 0.2288
time: 22.098024606704712
beta_score: 0.667686382255
time: 3.3181281089782715
beta_score: 0.740394519272
谢谢, 尼古拉斯
答案 0 :(得分:0)
我认为对GitHub上类似问题的评论可能对某人有所帮助 https://github.com/keras-team/keras/issues/3477#issuecomment-360022086
predict()和predict_generator()的输出实际上是相同的,但它们看起来不同,因为它们的标记不同。您正在为predict()提供标签,而predict_generator()正在从训练数据的目录结构中推断标签(因为它使用的是flow_from_directory()而不是flow())。