Question

我有两个简单的代码：一个计算PI 一个用于计算文件中小于10.000个字符的空格数。

我使用OpenMPI制作了串行代码和并行代码。

当我运行它们并使用以下方法比较CPU中的运行时间时：

clock_t begin = clock();

串行代码比并行代码快得多：

Serial code: 0.000234

OpenMPI with 2-nodes 0.005987
OpenMPI with 4-nodes 0.002890
OpenMPI with 8-nodes 0.015805

正如您所看到的，节点越多，它就越慢。

我希望了解原因。

以下是代码：

int main() {

clock_t begin = clock();

int file_size = 10000;
FILE * fp;
int my_size, my_id, size, local_acum=0, acum=0, i;
char buf[file_size], recv_vect[file_size];

fp = fopen("pru.txt","r");
fseek(fp, 0L, SEEK_END);
size = ftell(fp);
fseek(fp, 0L, SEEK_SET);
fread (buf,1,size,fp);

// Initialize the MPI environment 
MPI_Init(NULL, NULL); 
MPI_Comm_size(MPI_COMM_WORLD, &my_size); 
MPI_Comm_rank(MPI_COMM_WORLD,&my_id);

MPI_Scatter(buf, size / my_size, MPI_CHAR, recv_vect, 
    size / my_size, MPI_CHAR, 0, MPI_COMM_WORLD);

local_acum=0;
for (i=0; i < size / my_size ; i++){
    if (recv_vect[i] == ' '){
        local_acum++;
    }
}

acum=0;
MPI_Reduce(&local_acum, &acum, 1, MPI_INT, MPI_SUM, 
    0, MPI_COMM_WORLD);

if (my_id == 0){
    printf("Counter is %d \n", acum);
}

// Finalize the MPI environment. 
MPI_Finalize();

clock_t end = clock();

double run_time = (double)(end - begin) / CLOCKS_PER_SEC;

printf("Final time %f \n", run_time);
}



//Serial code

int main() {

clock_t begin = clock();

FILE * fp;
int size;
char buf[10000];

/* read file “pru.txt” and store it in buf[] */
/* NOTE: file must be smaller than 10000 characters */
fp = fopen("pru.txt","r");
fseek(fp, 0L, SEEK_END);
size = ftell(fp);
fseek(fp, 0L, SEEK_SET);
fread (buf,1,size,fp);

/* add the code to count number of spaces in buf[] */
int i =0; 
int acum=0; 
for (i=0; i<size; i++){
    if (buf[i] == ' ')
        acum++;
}
printf("Counter is %d \n", acum);

clock_t end = clock();

double run_time = (double)(end - begin) / CLOCKS_PER_SEC;

printf("Final time %f \n", run_time);
}

我的猜测是，考虑到问题的大小，存在分割数据，将其发送到节点，计算和减少以获得最终结果的开销。

问题在于问题非常简单，以至于并行执行无法超越其开销。

•并行任务粒度;

•沟通开销;

•流程之间的负载平衡。

感谢您的想法。

OpenMPI：执行比串行代码慢的并行代码。为什么？

0 个答案: