Question

我试图在并行处理大图像时减少在i / o上花费的处理时间。

数据：我有一个很大的hdf5文件。数据集包含大型图像（~14Gb）作为numpy数组

计算机：具有共享文件系统和12个计算节点的服务器，每个计算节点有28个核心。每个节点都有256Gb的RAM，并且只能从节点的核心部分访问150Gb的SSD HD。

程序
1.打开存储在普通HD中的hdf5文件 2.每个核心读取数据集/图像
3.每个核心处理图像
4.生成的图像保存在hdf5

中

问题
当我运行代码时，大部分处理时间用于I / O调用和等待HD访问。为了减少写作的时间，我要填写“＃”。通过将处理后的图像写入临时dictionary并将其写入hdf5来共享RAM。如果我通过这样做正确地理解了系统，我会将n i / o调用减少到1。我将测试对速度的影响，但因为我不是优化/计算机架构方面的专家，我想知道：

问题
1）减少写作时间的最佳策略是什么 2）有没有办法利用SSD HD来提高性能？

如果主题不适合stackoverflow，请告诉我应该在哪里重新发布。

由于

添加了一些显示读/写步骤的代码。在此示例中，读取和写入发生在相同的hdf5文件中但位于不同的组中。

CODE

# Open reading file (all ranks)
data_file_hdl=h5py.File('RawData.hdf5','r+',driver='mpio', comm=MPI.COMM_WORLD,libver='latest')  

# Create a saving group
saving_group=data_file_hdl.create_group('processed_data')

# Create the datasets (all ranks)
ImgSize=(69,2048,2048)
for pos in img_list:
    saving_group.create_dataset(pos,ImgSize,dtype=np.uint16,chunks=(1,ImgSize[1],ImgSize[2]))
data_file_hdl.flush()

# I then scatter the list of images to the each core.
# Each core will have a variable called xlocal that will have a str that 
# correspond to the dataset to load 
# ex. 
# Core1 xlocal= [1,10,20,50]
# Core2 xlocal=[22,33]

for idx in xlocal:
    # Load the image
    ImgStack=data_file_hdl['RawData_grp'][idx].value

    # Processing steps
    ------------------

    # Save the image
    saving_group[idx]=ImgStack

`

使用mpi减少大图像hf5文件的i / o时间

0 个答案: