使用并行HDF5编写HDF5数据集时程序卡住

时间:2020-02-24 12:34:41

标签: hdf5

我使用并行HDF5并测试了HDF5组中的一个问题,除了我更改了数据集的大小,并将数据类型更改为两倍。原始代码示例位于:

https://support.hdfgroup.org/ftp/HDF5/examples/parallel/Hyperslab_by_row.c

我使用的代码是:

#include "hdf5.h"
#include "stdlib.h"

#define H5FILE_NAME     "SDS_row.h5"
#define DATASETNAME     "IntArray" 
#define NX     800                      /* dataset dimensions */
#define NY     6554
#define RANK   2

int main (int argc, char **argv)
{
    /*
     * HDF5 APIs definitions
     */     
    hid_t       file_id, dset_id;         /* file and dataset identifiers */
    hid_t       filespace, memspace;      /* file and memory dataspace identifiers */
    hsize_t     dimsf[2];                 /* dataset dimensions */
    double         *data;                    /* pointer to data buffer to write */
    hsize_t count[2];             /* hyperslab selection parameters */
    hsize_t offset[2];
    hid_t   plist_id;                 /* property list identifier */
    herr_t  status;

    /*
     * MPI variables
     */
    int mpi_size, mpi_rank;
    MPI_Comm comm  = MPI_COMM_WORLD;
    MPI_Info info  = MPI_INFO_NULL;

    /*
     * Initialize MPI
     */
    MPI_Init(&argc, &argv);
    MPI_Comm_size(comm, &mpi_size);
    MPI_Comm_rank(comm, &mpi_rank);  

    /* 
     * Set up file access property list with parallel I/O access
     */
     plist_id = H5Pcreate(H5P_FILE_ACCESS);
     H5Pset_fapl_mpio(plist_id, comm, info);

    /*
     * Create a new file collectively and release property list identifier.
     */
    file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id);
    H5Pclose(plist_id);


    /*
     * Create the dataspace for the dataset.
     */
    dimsf[0] = NX;
    dimsf[1] = NY;
    filespace = H5Screate_simple(RANK, dimsf, NULL); 

    /*
     * Create the dataset with default properties and close filespace.
     */
    dset_id = H5Dcreate(file_id, DATASETNAME, H5T_NATIVE_DOUBLE, filespace,
            H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
    H5Sclose(filespace);

    /* 
     * Each process defines dataset in memory and writes it to the hyperslab
     * in the file.
     */
    count[0] = dimsf[0]/mpi_size;
    count[1] = dimsf[1];
    offset[0] = mpi_rank * count[0];
    offset[1] = 0;
    memspace = H5Screate_simple(RANK, count, NULL);

    /*
     * Select hyperslab in the file.
     */
    filespace = H5Dget_space(dset_id);
    H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, NULL, count, NULL);

    /*
     * Initialize data buffer 
     */
    data = (double *) malloc(sizeof(double)*count[0]*count[1]);
    for (hsize_t i=0; i < count[0]*count[1]; i++) {
        data[i] = mpi_rank + 10;
    }

    /*
     * Create property list for collective dataset write.
     */
    plist_id = H5Pcreate(H5P_DATASET_XFER);
    H5Pset_dxpl_mpio(plist_id, H5FD_MPIO_COLLECTIVE);

    status = H5Dwrite(dset_id, H5T_NATIVE_DOUBLE, memspace, filespace,
              plist_id, data);
    free(data);

    /*
     * Close/release resources.
     */
    H5Dclose(dset_id);
    H5Sclose(filespace);
    H5Sclose(memspace);
    H5Pclose(plist_id);
    H5Fclose(file_id);

    MPI_Finalize();

    return 0;
}     

如果我使用并行HDF5进行编译,并使用

mpirun -np 12 ./test

程序被卡住。但是,如果我使用NX = 500,则可以。而且,如果我使用4核,那也可以。我在网上搜索了整个下午的时间,找不到解决方案。有人可以让我知道如何解决此问题,或者这段代码的问题是什么?我使用MacOS,并使用带有gcc9的openmpi编译代码。

0 个答案:

没有答案