我正在使用mpi和h5py / hdf5(Hdf5和h5py编译为具有并行功能并且所有内容都在python 3.4上运行。)在群集上缝合重叠切片的数据集(200或更多图像/ numpy数组2048x2048)。
每个磁贴都有一个指定的索引号,它对应于应该写入的位置,所有索引都存储在一个数组中:
示例tile_array
:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]], dtype=int32)
我遇到的问题是:
在群集上运行代码时:
- 当我使用多个内核并行编写拼接图像时,我会随机丢失图块
- 当我在rank == 0中连续运行拼接时,我仍然丢失了瓷砖
- 如果数据集很小,则没有问题
- 如果在并行写入时不激活原子模式,则错误数量会增加。如果原子模式为
,则错误数量增加
当我在rank == 0
在笔记本电脑上运行代码:
- 当我在笔记本电脑的单核上运行相同的代码时,删除了mpi代码,没有丢失任何图块。
任何评论都将不胜感激。
程序的亮点:
0)在拼接之前,处理该组的每个图像并混合重叠的边缘,以便在重叠区域中获得更好的结果
1)我打开要保存数据的hdf文件:stitching_file=h5py.File(fpath,'a',driver='mpio',comm=MPI.COMM_WORLD)
2)我创建了一个包含拼接图像的大数据集
3)我用零填充数据集
4)处理后的图像将由不同的核心写入(添加)到大数据集中。然而,相邻图像具有重叠区域。为了避免不同的核心会写出一些重叠的碎片并打破写作,我将运行四轮写作,以便用这种模式覆盖整个数据集:
示例tile_array
:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]], dtype=int32)
Writing rounds:
writing round one: array([ 0, 2, 8, 10], dtype=int32)
writing round two: array([ 1, 3, 9, 11], dtype=int32)
writing round three: array([ 4, 6, 12, 14], dtype=int32)
writing round four: array([ 5, 7, 13, 15], dtype=int32)
如果我在单一级别上写字,我就不会使用这个方案,只是按顺序写下每个位置
伪代码
# Preprocessing of the images
------------------------------
------------------------------
# MPI data chunking function
def tasks_chunking(tasks,size):
# This function scatter any kind of list, not only ordered ones
# If the number of cores is bigger than the length of the list
# the size of the chunk will be zero and the
# Chunk the list of tasks
Chunked_list=np.array_split(tasks,size)
NumberChunks=np.zeros(len(Chunked_list),dtype='int32')
for idx,chunk in enumerate(Chunked_list):
NumberChunks[idx]=len(chunk)
# Calculate the displacement
Displacement=np.zeros(len(NumberChunks),dtype='int32')
for idx,count in enumerate(NumberChunks[0:-1]):
Displacement[idx+1]=Displacement[idx]+count
return Chunked_list,NumberChunks,Displacement
# Flush the file after the preprocessing
stitching_file.flush()
# Writing dictionary. Each group will contain a subset of the tiles to be written.
# Data is the size of the final array where to save the data
fake_tile_set = np.arange(Data.size,dtype='int32')
fake_tile_set = fake_tile_set.reshape(Data.shape)
writing_dict = {}
writing_dict['even_group1']=fake_tile_array[::2, ::2].flat[:]
writing_dict['even_group2']=fake_tile_array[::2, 1::2].flat[:]
writing_dict['odd_group1'] =fake_tile_array[1::2, ::2].flat[:]
writing_dict['odd_group2'] =fake_tile_array[1::2, 1::2].flat[:]
# Withouth activating the atomic mode the number of errors in parallel mode is higher.
# Withouth activating the atomic mode the number of errors in parallel mode is higher.
stitching_file.atomic=True
# Loop through the dictionary items to start the writing
for key, item in writing_dict.items():
# Chunk the positions that need to be written
if rank==0:
Chunked_list,NumberChunks,Displacement=tasks_chunking(item,size)
else:
NumberChunks = None
Displacement = None
Chunked_list= None
# Make the cores aware of the number of jobs that will need to run
# The variable count is created by the scatter function and has the number of
# processes and is different in every core
cnt = np.zeros(1, dtype='int32')
comm.Scatter(NumberChunks, cnt, root=0)
# Define the local variable that will be filled up wuth the scattered data
xlocal = np.zeros(cnt, dtype='int32') # Use rank for determine the size of the xlocal on the different cores
# Scatter the value of tasks to the different cores
comm.Scatterv([item,NumberChunks,Displacement,MPI.INT],xlocal, root=0)
for tile_ind in xlocal:
# This writing function is the same called when I run the writing on rank 0 or on a single core in a laptop.
paste_in_final_image(joining, temp_file, stitched_group, tile_ind, nr_pixels)
comm.Barrier()
stitching_file.flush()
stitching_file.close()