使用MPI的细分错误

时间:2019-05-24 18:40:27

标签: c pointers segmentation-fault malloc mpi

我有一个带数字的矩阵,我想在MPI进程之间分配(我想将矩阵分成多个块,并使每个进程都占有自己的一部分)。您可以在下面看到我对它的评论,以便您理解:

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define MASTER 0

int *data; // pointer to data
int total_elems; // total number of elems
int type_size; // size of the data type (in our case it will be int)
int main (int argc, char *argv[]){
        MPI_Status status; 
        int my_rank,my_size; 
        int rc = -1; 
        int chunk; 
        int i; 

        type_size = sizeof(int);
        if (argc != 2){ 
                printf("usage: %s file_name\n",argv[0]); 
                exit(1); 
        }
        printf("Using %s as input\n",argv[1]); 
        total_elems = file_size(argv[1],sizeof(int)); // this function calculates the number of elems in the file/matrix
        if (total_elems<0){ 
                printf("Invalid number of elements\n"); 
                MPI_Abort(MPI_COMM_WORLD, rc); 
        }
        printf("There are %d elems in the matrix\n",total_elems); 

        MPI_Init(&argc,&argv); // initialize MPI environment
        MPI_Comm_rank(MPI_COMM_WORLD,&my_rank); 
        MPI_Comm_size(MPI_COMM_WORLD,&my_size); 

        printf("Number of MPI processes: %d\n", my_size);
        chunk = total_elems/my_size; // elems in a chunk, a chunk for every process
        printf("Chunk size: %d\n", chunk); 
printf("up to here ok 0\n");
if (my_rank == MASTER) 
        {
             // nothing  
printf("up to here ok 0.5\n"); 
         // here it is still ok
        }else{  // NOW HERE ATTENTION!! here it is still ok
                printf("ok?\n"); // does not print this printf, i have a segmentation fault
                data = (int *)malloc(chunk*sizeof(int)); // we assign the number of bytes to the chunk 
                if (data == NULL){
                        printf("Error in malloc\n");
                        MPI_Abort(MPI_COMM_WORLD, rc); 
                }

                rc = read_from_pos(argv[1], chunk*(my_rank-1), chunk, type_size, (void *)data); // a function that reads from the file from a position a specified number of elements and stores in the buffer data
                if (rc<0){ 
                        printf("Error reading file\n");
                        MPI_Abort(MPI_COMM_WORLD, rc);
                }


        }

函数'read_from_pos':

int read_from_pos(char *name, uint pos,uint num_elems,uint type_size,void *buff)
{
        int fd,ret=0,pending,ready=0;
        fd=open(name,O_RDONLY); 
        if (fd<0)  return -1;
        ret=lseek(fd,(pos*type_size),SEEK_SET); // we go to the position
        if (ret!=pos*type_size) return -1;
        pending=num_elems*type_size; // pending items to be read
        ready=0; // read bytes 0 at the moment
        while(pending>0){ 
                ret=read(fd,(char *)buff+ready,pending);
                if (ret<0) return -1;
                pending=pending-ret;
                ready=ready+ret;
        }
        printf("Total number of elements read %u\n",ready/type_size);
        close(fd);
        return 0;
}

所以,我不明白,为什么在if语句中一切都还好,但是当转到else语句时,当该进程不是主站时,它甚至不打印printf并得到分段错误。我确信printf不会产生分段错误,我使用了“ \ n”,所以我的猜测是malloc,但是我仍然认为可以。我已经为每个进程分配了malloc中块的大小。

所以也许是因为MPI。我是否必须以某种方式保护这段代码?我想为每个从属进程分配一个独立的矩阵块,我弄错了吗?

具有10个数字的矩阵的输出我明白了:

./prog easy.txt
Using easy.txt as input
There are 10 elements in the file
There are 10 elems in the matrix
Number of MPI processes: 1
Chunk size: 10
up to here ok 0
up to here ok 0.5
Segmentation fault

0 个答案:

没有答案