Question

我正在使用mpi和h5py / hdf5（Hdf5和h5py编译为具有并行功能并且所有内容都在python 3.4上运行。）在群集上缝合重叠切片的数据集（200或更多图像/ numpy数组2048x2048）。
每个磁贴都有一个指定的索引号，它对应于应该写入的位置，所有索引都存储在一个数组中：

示例tile_array：

array([[ 0,  1,  2,  3],  
[ 4,  5,  6,  7],  
[ 8,  9, 10, 11],  
[12, 13, 14, 15]], dtype=int32)

我遇到的问题是：

在群集上运行代码时：
- 当我使用多个内核并行编写拼接图像时，我会随机丢失图块 - 当我在rank == 0中连续运行拼接时，我仍然丢失了瓷砖 - 如果数据集很小，则没有问题 - 如果在并行写入时不激活原子模式，则错误数量会增加。如果原子模式为
，则错误数量增加当我在rank == 0

上运行写入时激活

在笔记本电脑上运行代码：
- 当我在笔记本电脑的单核上运行相同的代码时，删除了mpi代码，没有丢失任何图块。

任何评论都将不胜感激。

程序的亮点：
0）在拼接之前，处理该组的每个图像并混合重叠的边缘，以便在重叠区域中获得更好的结果
1）我打开要保存数据的hdf文件：stitching_file=h5py.File(fpath,'a',driver='mpio',comm=MPI.COMM_WORLD)
2）我创建了一个包含拼接图像的大数据集 3）我用零填充数据集 4）处理后的图像将由不同的核心写入（添加）到大数据集中。然而，相邻图像具有重叠区域。为了避免不同的核心会写出一些重叠的碎片并打破写作，我将运行四轮写作，以便用这种模式覆盖整个数据集：

示例tile_array：

array([[ 0,  1,  2,  3],
      [ 4,  5,  6,  7],
      [ 8,  9, 10, 11],
      [12, 13, 14, 15]], dtype=int32)  
Writing rounds:
writing round one: array([ 0,  2,  8, 10], dtype=int32)  
writing round two: array([ 1,  3,  9, 11], dtype=int32) 
writing round three: array([ 4,  6, 12, 14], dtype=int32)  
writing round four: array([ 5,  7, 13, 15], dtype=int32)

如果我在单一级别上写字，我就不会使用这个方案，只是按顺序写下每个位置

伪代码

# Preprocessing of the images  
------------------------------  
------------------------------  


# MPI data chunking function
def tasks_chunking(tasks,size):
    # This function scatter any kind of list, not only ordered ones
    # If the number of cores is bigger than the length of the list
    # the size of the chunk will be zero and the 

    # Chunk the list of tasks
    Chunked_list=np.array_split(tasks,size)
    NumberChunks=np.zeros(len(Chunked_list),dtype='int32')
    for idx,chunk in enumerate(Chunked_list):
        NumberChunks[idx]=len(chunk)

    # Calculate the displacement
    Displacement=np.zeros(len(NumberChunks),dtype='int32')
    for idx,count in enumerate(NumberChunks[0:-1]):
        Displacement[idx+1]=Displacement[idx]+count

    return Chunked_list,NumberChunks,Displacement



# Flush the file after the preprocessing
stitching_file.flush()

# Writing dictionary. Each group will contain a subset of the tiles to be written.
# Data is the size of the final array where to save the data
fake_tile_set = np.arange(Data.size,dtype='int32')
fake_tile_set = fake_tile_set.reshape(Data.shape)

writing_dict = {}
writing_dict['even_group1']=fake_tile_array[::2, ::2].flat[:]
writing_dict['even_group2']=fake_tile_array[::2, 1::2].flat[:]
writing_dict['odd_group1'] =fake_tile_array[1::2, ::2].flat[:]
writing_dict['odd_group2'] =fake_tile_array[1::2, 1::2].flat[:]


 # Withouth activating the atomic mode the number of errors in parallel mode is higher.

 # Withouth activating the atomic mode the number of errors in parallel mode is higher.

stitching_file.atomic=True

# Loop through the dictionary items to start the writing
for key, item in writing_dict.items():

# Chunk the positions that need to be written
if rank==0:
    Chunked_list,NumberChunks,Displacement=tasks_chunking(item,size)
else:
    NumberChunks = None
    Displacement = None
    Chunked_list= None

# Make the cores aware of the number of jobs that will need to run
# The variable count is created by the scatter function and has the number of
# processes and is different in every core
cnt = np.zeros(1, dtype='int32')
comm.Scatter(NumberChunks, cnt, root=0)
# Define the local variable that will be filled up wuth the scattered data
xlocal = np.zeros(cnt, dtype='int32') # Use rank for determine the size of the xlocal on the different cores

# Scatter the value of tasks to the different cores
comm.Scatterv([item,NumberChunks,Displacement,MPI.INT],xlocal, root=0)

for tile_ind in xlocal:
    # This writing function is the same called when I run the writing on rank 0 or on a single core in a laptop.
    paste_in_final_image(joining, temp_file, stitched_group, tile_ind, nr_pixels)    

comm.Barrier()
stitching_file.flush()
stitching_file.close()

并行h5py / hdf5写入大型数据集会跳过数据块

0 个答案: