Question

我正在尝试创建一个预处理函数，以便可以将training_dataset直接输入到keras顺序神经网络中。预处理功能应返回特征和标签。

def preprocessing_function(data):
        features = ...
        labels = ...
        return features, labels

dataset, info = tfds.load(name='cats_vs_dogs', split=tfds.Split.TRAIN, with_info=True)
    
training_dataset = dataset.map(preprocessing_function)

我应该如何写preprocessing_function？我花了几个小时研究并尝试使其实现，但无济于事。希望有人可以提供帮助。

Answer 1

这里有两个预处理功能。第一个将同时应用于训练和验证数据，以规范化数据并调整其大小以达到预期的网络大小。第二个功能，增强，将仅应用于训练集。您想要执行的扩充类型取决于您的数据集和应用程序，但我以示例为例。

#Fetching, pre-processing & preparing data-pipeline
def preprocess(ds):
    x = tf.image.resize_with_pad(ds['image'], IMG_SIZE_W, IMG_SIZE_H)
    x = tf.cast(x, tf.float32)
    x = (x-MEAN)/(VARIANCE)
    y = tf.one_hot(ds['label'], NUM_CLASSES)
    return x, y

def augmentation(image,label):
    image = tf.image.random_flip_left_right(image)
    image = tf.image.resize_with_crop_or_pad(image, IMG_W+4, IMG_W+4) # zero pad each side with 4 pixels
    image = tf.image.random_crop(image, size=[BATCH_SIZE, IMG_W, IMG_H, 3]) # Random crop back to 32x32
    return image, label

要加载训练和验证数据集，请执行以下操作：

def get_dataset(dataset_name, shuffle_buff_size=1024, batch_size=BATCH_SIZE, augmented=True):
    train, info_train = tfds.load(dataset_name, split='train[:80%]', with_info=True)
    val, info_val = tfds.load(dataset_name, split='train[80%:]', with_info=True)

    TRAIN_SIZE = info_train.splits['train'].num_examples * 0.8
    VAL_SIZE = info_train.splits['train'].num_examples * 0.2

    train = train.map(preprocess).cache().repeat().shuffle(shuffle_buff_size).batch(batch_size)
    if augmented==True:
        train = train.map(augmentation)
    train = train.prefetch(tf.data.experimental.AUTOTUNE)

    val = val.map(preprocess).cache().repeat().batch(batch_size)
    val = val.prefetch(tf.data.experimental.AUTOTUNE)

    return train, info_train, val, info_val, TRAIN_SIZE, VAL_SIZE

TensorFlow数据集'cats_vs_dogs'的预处理

1 个答案: