Question

我正在对大型图像数据集执行信号处理任务，将图像转换为具有特定结构(number_of_transforms, width, height, depth)的大特征向量。

我的代码中的要素向量（或coefficients）太大而无法一次保留在内存中，因此我尝试将它们写入np.mmap，如下所示：

coefficients = np.memmap(
    output_location, dtype=np.float32, mode="w+",
    shape=(n_samples, number_of_transforms, width, height, depth))

for n in range(n_samples):
    image = images[n]
    coefficients_sample = transform(images[n])
    coefficients[n, :, :, :, :] = coefficients_sample

这适用于我的目的，有一个缺点：如果我想加载某个“运行”的系数（transform必须用不同的超参数测试）以后再进行分析，我有以某种方式重建原始形状(number_of_transforms, width, height, depth)，这必然会变得混乱。

是否有更清晰（优选兼容numpy）的方式，允许我保留transform特征向量的结构和数据类型，同时仍然间歇性地将transform的结果写入磁盘？

Answer 1

正如@ juanpa.arrivillaga指出的那样，唯一需要改变的是使用numpy.lib.format.open_memmap代替np.memmap：

coefficients = numpy.lib.format.open_memmap(
    output_location, dtype=np.float32, mode="w+",
    shape=(n_samples, number_of_transforms, width, height, depth))

稍后，检索数据（具有正确的形状和数据类型），如下所示：

coefficients = numpy.lib.format.open_memmap(output_location)

如何逐步将大量数据写入内存？

1 个答案: