使用TensorFlow和tf.data.Dataset从文件夹中采样一系列图像

时间:2019-07-16 13:11:03

标签: python tensorflow

考虑一个带有文件夹名称和相应标签的数据框。每个文件夹包含来自视频文件的任意数量的图像。我正在寻找一种使用tf.data.Dataset从文件夹中采样图像序列以训练动作识别模型的方法。像这样:

ds = tf.data.Dataset.from_tensor_slices(list_of_folders)

def read_and_preprocess_images_from_folder(folder):
    list_of_image_names = some_function_to_list_files(folder)
    list_length = len(list_of_image_names)

    upper_boundary = list_length - sequence_length
    random_start_index = tf.random_uniform(shape=[], minval=0, maxval=upper_boundary, dtype=tf.int64)

    random_sequence = list_of_image_names[random_start_index:random_start_index+sequence_length]

    return convert_sequence_to_image_tensor(random_sequence) 

到目前为止我所做的:

df = pd.DataFrame({'folder': ['folder_0', 'folder_1'], 'target': [0, 1]})

ds = tf.data.Dataset.from_tensor_slices((df.folder.values, df.target.values))

def load_and_preprocess_image_sequence(folder):
    x = tf.io.matching_files('/path/to/folders/' + folder + '/*.jpg') 
    x = tf.map_fn(lambda x: preprocess_image(tf.read_file(x)), x, dtype=tf.float32)
    return x

def preprocess_image(x):
    x = tf.image.decode_jpeg(x, channels=3)
    x = tf.image.resize_images(x, size=(IMAGE_SIZE,IMAGE_SIZE))
    return x

def load_and_preprocess_from_folder_label(folder, label):
    return load_and_preprocess_image_sequence(folder), label

train_ds = train_ds.map(load_and_preprocess_from_folder_label)

我得到:

<DatasetV1Adapter shapes: ((?, 224, 224, 3), ()), types: (tf.float32, tf.int64)>

问题在于,tf.io.matching_filestf.data.Dataset一起使用时返回没有形状的张量。它仅在渴望执行时返回定义的形状。

我试图以不同的方式解决这个问题。知道每个文件夹中的每个图像都具有相同的结构['0001.jpg', '0002.jpg'],我尝试使用np.random.randint,但是问题是np.random.randint每次都会产生相同的结果:

def load_and_preprocess_image_sequence(folder):
    random_start_index = np.random.randint(0,upper_boundary) 
    x = []
    for i in range(random_start_index, random_start_index+sequence_length):
        x.append('/path/to/folders/' + folder + f'/{i:04d}.jpg')

    x = [tf.read_file(i) for i in x]
    x = [preprocess_image(i) for i in x]    
    x = tf.stack(x, axis=0)
    return x

除了每次使用相同的random_start_index之外,它都可以正常工作。 为了解决随机性问题,我必须使用tf.random_uniform

def load_and_preprocess_image_sequence(folder):
    random_start_index = tf.random_uniform(shape=[], minval=0, maxval=upper_boundary, dtype=tf.int64)
    range = tf.map_fn(lambda x: x + random_start_index, tf.range(sequence_length, dtype=tf.int64))

我得到一个连续数字的张量,从随机数开始,其长度等于sequence_length。现在的问题是,tf.strings.format在某种程度上受到限制,无法像python f'{i:04d}.jpg'那样产生与python格式相当的结果。

1 个答案:

答案 0 :(得分:0)

我能够解决这个问题。这是一个示例:

x = tf.io.matching_files(folder + '/*.jpg')
max_start_index = tf.cast(len(x) - SEQUENCE_LEN, tf.int64)

if max_start_index == 0:
    random_start_index = max_start_index
else:
    random_start_index = tf.random.uniform(shape=[], minval=0, maxval=max_start_index, dtype=tf.int64)

x = x[random_start_index:random_start_index + SEQUENCE_LEN]
x = tf.map_fn(lambda x: load_image(x), x, dtype=tf.uint8)