Question

我正在使用 stanford dog 数据集进行一些图像分类，使用以下命令加载。

dataset, info = tfds.load(name="stanford_dogs", with_info=True)

我想使用数据集中可用的边界框进行裁剪，然后将图像调整为标准尺寸。

以下是我对数据进行预处理的代码：

IMG_LEN = 128
IMG_SHAPE = (IMG_LEN,IMG_LEN,3)
N_BREEDS = 120

training_data = dataset['train']
test_data = dataset['test']

def preprocess(ds_row):
    ymin, xmin, ymax, xmax = tf.squeeze(ds_row['objects']['bbox'])
    image = tf.image.crop_to_bounding_box(ds_row['image'], ymin, ymax, xmin, xmax)
    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    image = tf.image.resize(image, (IMG_LEN, IMG_LEN), method='nearest')
  
    # Onehot encoding labels
    label = tf.one_hot(ds_row['label'],N_BREEDS)

    return image, label

def prepare(dataset, batch_size=None):
    ds = dataset.map(preprocess, num_parallel_calls=4)
    ds = ds.shuffle(buffer_size=1000)
    if batch_size:
        ds = ds.batch(batch_size)
    ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    return ds

train_ds = prepare(dataset['train'],batch_size=128)
test_ds = prepare(dataset['test'],batch_size=128)

这是数据集的内容。

<PrefetchDataset shapes: {image: (None, None, 3), image/filename: (), label: (), objects: {bbox: (None, 4)}}, types: {image: tf.uint8, image/filename: tf.string, label: tf.int64, objects: {bbox: tf.float32}}>

在运行代码时，我收到一条关于以下行的错误信息：OperatorNotAllowedInGraphError: iterating over tf.Tensor is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.：ymin, xmin, ymax, xmax = tf.squeeze(ds_row['objects']['bbox'])

我不明白这个错误，因为我没有指定 tf 使用图形运行。我使用的是 tensorflow 2.4。

Answer 1

tf.data.Dataset 出于性能原因默认为图形模式。在图形模式下，您不能进行元组解包。您应该改用 tf.unstack。

请注意，您的数据集具有可变数量的边界框。您可以只考虑每个图像一个边界框，或者编写一个与边界框数量无关的预处理函数。

另请注意，数据集的边界框在 0 和 1 之间标准化，而 tf.image.crop_to_bounding_box 期望其坐标（或偏移量）为整数，格式为 (top, left, height, width)

我建议跳过元组解包并使用 tf.image.crop_and_resize 代替，它可以处理每个图像的任意数量的框。

修正你的方法：

此方法将仅处理每个图像的 1 个边界框。

def preprocess(ds_row):
    image = ds_row['image']
    height, width = tf.unstack(tf.shape(image)[:2])
    # we consider only the first bounding box
    box = ds_row['objects']['bbox'][0]
    # conversion to top left + width height for crop_to_bounding_box
    scaled_box = box * [height, width, height, width]
    # using unstack instead of tuple unpacking
    ymin, xmin, ymax, xmax = tf.unstack(tf.cast(scaled_box, tf.int32))
    box_width = xmax - xmin
    box_height = ymax - ymin
    image = tf.image.crop_to_bounding_box(image, ymin, xmin, box_height, box_width)
    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    image = tf.image.resize(image, (IMG_LEN, IMG_LEN), method='nearest')

    # Onehot encoding labels
    label = tf.one_hot(ds_row['label'],N_BREEDS)

    return image, label

def prepare(dataset, batch_size=None):
    ds = dataset.map(preprocess, num_parallel_calls=4)
    ds = ds.shuffle(buffer_size=1000)
    if batch_size:
        ds = ds.batch(batch_size)
    ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    return ds

我的建议：

此方法将返回与每个图像中的边界框一样多的图像。

def preprocess(ds_row):
    image = tf.expand_dims(
        tf.image.convert_image_dtype(ds_row["image"], dtype=tf.float32), axis=0
    )
    bboxes = ds_row["objects"]["bbox"]
    # we only have one image, so all the boxes belong to the the first one.
    box_indices = tf.zeros(tf.shape(bboxes)[0], tf.int32)
    images = tf.image.crop_and_resize(
        image, bboxes, box_indices, crop_size=(IMG_LEN, IMG_LEN), method="nearest"
    )

    # Onehot encoding labels
    labels = tf.one_hot(ds_row["label"], N_BREEDS)
    # assuming that the multiple dogs on the image belong to the same class, 
    # as we get only one label
    labels = tf.repeat(tf.expand_dims(labels, axis=0), tf.shape(images)[0], axis=0)
    return images, labels


def prepare(dataset, batch_size=None):
    # we have to call unbatch to because of the variable number of bounding boxes 
    # resulting in a variable amount of images generated
    ds = dataset.map(preprocess, num_parallel_calls=4).unbatch()
    ds = ds.shuffle(buffer_size=1000)
    if batch_size:
        ds = ds.batch(batch_size)
    ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    return ds

从边界框裁剪 Tensorflow 图像

1 个答案:

修正你的方法：

我的建议：