<培训样本>和<验证样本>是什么意思?

时间:2019-11-28 20:46:49

标签: python machine-learning syntax-error conv-neural-network

我从Github获得了这段代码,它是一种开源的青光眼检测机器学习算法,它使用卷积网络将视网膜图像分类为是/否是青光眼:

public void doitZeroPadding()
{
   ...
   // For the simplicity, I assume that data size is smaller than 128. 
   // You need to change this part as needed.
   Cipher cipher = Cipher.getInstance("AES/CBC/NoPadding");
   int dsize = srcBuff.length + 1; // 1 is for plain buffer size
   // This line align size to the multiple of block size.
   int newBufSize = ((dsize + cipher.getBlockSize() - 1) / cipher.getBlockSize()) * cipher.getBlockSize();
   byte[] newSrcBuf = new byte[newBufSize];
   // You need real buffer size, or you don't know how long is decrypted buffer.
   // I add it inside encrypting buffer to prevent other to see real decrypted buffer size.
   // But if you want to have exact same encrypted buffer on both sides, you must remove it.
   newSrcBuf[0] = (byte)(srcBuff.length);
   System.arraycopy(srcBuff, 0, newSrcBuf, 1, srcBuff.length);   
  // Now use newSrcBuf/newBufSize 
   ...
}

除了我一直收到此错误:

from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model
from keras.layers import Input, Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.layers import BatchNormalization, Activation, Dropout, Flatten, Dense
from keras import backend as K
from keras import optimizers
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from imgaug import augmenters as iaa

img_width, img_height = 256, 256
input_shape = (img_width, img_height, 3)

train_data_dir = "data/train"
validation_data_dir = "data/validation"
nb_train_samples = <training samples>
nb_validation_samples = <validation samples>
batch_size = 16
epochs = 100

input = Input(shape=input_shape)

block1 = BatchNormalization(name='norm_0')(input)

# Block 1
block1 = Conv2D(8, (3,3), name='conv_11', activation='relu')(block1)
block1 = Conv2D(16, (3,3), name='conv_12', activation='relu')(block1)
block1 = Conv2D(32, (3,3), name='conv_13', activation='relu')(block1)
block1 = Conv2D(64, (3,3), name='conv_14', activation='relu')(block1)
block1 = MaxPooling2D(pool_size=(2, 2))(block1)
block1 = BatchNormalization(name='norm_1')(block1)

block1 = Conv2D(16, 1)(block1)

# Block 2
block2 = Conv2D(32, (3,3), name='conv_21', activation='relu')(block1)
block2 = Conv2D(64, (3,3), name='conv_22', activation='relu')(block2)
block2 = Conv2D(64, (3,3), name='conv_23', activation='relu')(block2)
block2 = Conv2D(128, (3,3), name='conv_24', activation='relu')(block2)
block2 = MaxPooling2D(pool_size=(2, 2))(block2)
block2 = BatchNormalization(name='norm_2')(block2)

block2 = Conv2D(64, 1)(block2)

# Block 3
block3 = Conv2D(64, (3,3), name='conv_31', activation='relu')(block2)
block3 = Conv2D(128, (3,3), name='conv_32', activation='relu')(block3)
block3 = Conv2D(128, (3,3), name='conv_33', activation='relu')(block3)
block3 = Conv2D(64, (3,3), name='conv_34', activation='relu')(block3)
block3 = MaxPooling2D(pool_size=(2, 2))(block3)
block3 = BatchNormalization(name='norm_3')(block3)

# Block 4
block4 = Conv2D(64, (3,3), name='conv_41', activation='relu')(block3)
block4 = Conv2D(32, (3,3), name='conv_42', activation='relu')(block4)
block4 = Conv2D(16, (3,3), name='conv_43', activation='relu')(block4)
block4 = Conv2D(8, (2,2), name='conv_44', activation='relu')(block4)
block4 = MaxPooling2D(pool_size=(2, 2))(block4)
block4 = BatchNormalization(name='norm_4')(block4)

block4 = Conv2D(2, 1)(block4)

block5 = GlobalAveragePooling2D()(block4)
output = Activation('softmax')(block5)

model = Model(inputs=[input], outputs=[output])
model.summary()
model.compile(loss="categorical_crossentropy", optimizer=optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False), metrics=["accuracy"])

# Initiate the train and test generators with data Augumentation
sometimes = lambda aug: iaa.Sometimes(0.6, aug)
seq = iaa.Sequential([
                      iaa.GaussianBlur(sigma=(0 , 1.0)),
                      iaa.Sharpen(alpha=1, lightness=0),
                      iaa.CoarseDropout(p=0.1, size_percent=0.15),
                              sometimes(iaa.Affine(
                                                    scale={"x": (0.8, 1.2), "y": (0.8, 1.2)},
                                                    translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)},
                                                    rotate=(-30, 30),
                                                    shear=(-16, 16)))
                    ])


train_datagen = ImageDataGenerator(
    rescale=1./255,
    preprocessing_function=seq.augment_image,
    horizontal_flip=True,
    vertical_flip=True)

test_datagen = ImageDataGenerator(
    rescale=1./255,
    horizontal_flip=True,
    vertical_flip=True)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode="categorical")

validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_height, img_width),
    class_mode="categorical")

checkpoint = ModelCheckpoint("f1.h5", monitor='acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.1, patience=2, verbose=0, mode='auto', cooldown=0, min_lr=0)

model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=nb_validation_samples // batch_size,
    callbacks=[checkpoint, reduce_lr]
)

为了避免出现此错误,我应该将File "CNN.py", line 15 nb_train_samples = <training samples> ^ SyntaxError: invalid syntax <training samples>替换为什么?除此之外,其余的代码都有效。

谢谢大家,萨蒂亚(satya)

2 个答案:

答案 0 :(得分:0)

我不确定如何用代码来填充它,但是我可以说出什么是训练和验证样本。

训练样本是用于训练模型的数据。模型学习为特定样本提供一些输出。但是我们真的不想教一个模型来识别样本,但是我们希望识别出“模式”

这就是为什么我们使用验证数据。确保该模型不仅适用于用于学习的样本,而且还适用于尚未见过的样本。

您的脚本似乎期望每个样本的结构都为(256,256,3),但是负责加载该数据的代码尚未出现。

答案 1 :(得分:0)

这些数字是指训练和验证目录中的“验证和训练”样本数。还应注意,根据Keras文档,这些目录每个类应包含一个子目录。每个子目录目录树中的任何 PNG,JPG,BMP,PPM或TIF 图片都将包含在生成器中。

如果您不知道这些目录中有多少图像,或者将来可能在这些目录中添加新图像,则可以使用:

nb_train_samples = sum([len(files) for r, d, files in os.walk(train_data_dir)])
nb_validation_samples = sum([len(files) for r, d, files in os.walk(validation_data_dir)])