我在自己的一些图像上训练VGG。 我有以下代码:
img_width, img_height = 512, 512
top_model_weights_path = 'UIP-versus-inconsistent.h5'
train_dir = 'MasterHRCT/Limited-Cuts-UIP-Inconsistent/train'
validation_dir = 'MasterHRCT/Limited-Cuts-UIP-Inconsistent/validation'
nb_train_samples = 1500
nb_validation_samples = 500
epochs = 50
batch_size = 16
def save_bottleneck_features():
datagen = ImageDataGenerator(rescale=1. / 255)
model = applications.VGG16(include_top=False, weights='imagenet')
generator = datagen.flow_from_directory(
train_dir,
target_size=(img_width, img_height),
shuffle=False,
class_mode=None,
batch_size=batch_size
)
bottleneck_features_train = model.predict_generator(generator=generator, steps=nb_train_samples // batch_size)
np.save(file="UIP-versus-inconsistent_train.npy", arr=bottleneck_features_train)
generator = datagen.flow_from_directory(
validation_dir,
target_size=(img_width, img_height),
shuffle=False,
class_mode=None,
batch_size=batch_size,
)
bottleneck_features_validation = model.predict_generator(generator, nb_validation_samples // batch_size)
np.save(file="UIP-versus-inconsistent_validate.npy", arr=bottleneck_features_validation)
generator = datagen.flow_from_directory(
validation_dir,
target_size=(img_width, img_height),
shuffle=False,
class_mode=None,
batch_size=batch_size,
)
bottleneck_features_validation = model.predict_generator(generator, nb_validation_samples // batch_size)
np.save(file="UIP-versus-inconsistent_validate.npy", arr=bottleneck_features_validation)
执行此操作后,我根据我的目录
按预期进行操作 Found 1500 images belonging to 2 classes.
Found 500 images belonging to 2 classes
然后我跑
train_data = np.load(file="UIP-versus-inconsistent_train.npy")
train_labels = np.array([0] * 750 + [1] * 750)
validation_data = np.load(file="UIP-versus-inconsistent_validate.npy")
validation_labels = np.array([0] * 250 + [1] * 250)
然后检查数据
print("Train data shape", train_data.shape)
print("Train_labels shape", train_labels.shape)
print("Validation_data shape", validation_labels.shape)
print("Validation_labels", validation_labels.shape)
我得到了
Train data shape (1488, 16, 16, 512)
Train_labels shape (1488,)
Validation_data shape (496,)
Validation_labels (496,)
这是可变的 - 而不是有1500个训练数据示例和500个验证示例,就像我“失去”一些。有时我跑 save_bottleneck_features(): 这些数字是正确的,有时则不然。当这个过程需要很长时间时,它会发生很多。对此有可重复的解释吗?可能是图像损坏了?
答案 0 :(得分:1)
很简单:
1488 = (1500 // batch_size) * batch_size
496 = (500 // batch_size) * batch_size
您的损失来自整数除法不准确。