我有两个简单的代码: 一个计算PI 一个用于计算文件中小于10.000个字符的空格数。
我使用OpenMPI制作了串行代码和并行代码。
当我运行它们并使用以下方法比较CPU中的运行时间时:
clock_t begin = clock();
串行代码比并行代码快得多:
Serial code: 0.000234
OpenMPI with 2-nodes 0.005987
OpenMPI with 4-nodes 0.002890
OpenMPI with 8-nodes 0.015805
正如您所看到的,节点越多,它就越慢。
我希望了解原因。
以下是代码:
int main() {
clock_t begin = clock();
int file_size = 10000;
FILE * fp;
int my_size, my_id, size, local_acum=0, acum=0, i;
char buf[file_size], recv_vect[file_size];
fp = fopen("pru.txt","r");
fseek(fp, 0L, SEEK_END);
size = ftell(fp);
fseek(fp, 0L, SEEK_SET);
fread (buf,1,size,fp);
// Initialize the MPI environment
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &my_size);
MPI_Comm_rank(MPI_COMM_WORLD,&my_id);
MPI_Scatter(buf, size / my_size, MPI_CHAR, recv_vect,
size / my_size, MPI_CHAR, 0, MPI_COMM_WORLD);
local_acum=0;
for (i=0; i < size / my_size ; i++){
if (recv_vect[i] == ' '){
local_acum++;
}
}
acum=0;
MPI_Reduce(&local_acum, &acum, 1, MPI_INT, MPI_SUM,
0, MPI_COMM_WORLD);
if (my_id == 0){
printf("Counter is %d \n", acum);
}
// Finalize the MPI environment.
MPI_Finalize();
clock_t end = clock();
double run_time = (double)(end - begin) / CLOCKS_PER_SEC;
printf("Final time %f \n", run_time);
}
//Serial code
int main() {
clock_t begin = clock();
FILE * fp;
int size;
char buf[10000];
/* read file “pru.txt” and store it in buf[] */
/* NOTE: file must be smaller than 10000 characters */
fp = fopen("pru.txt","r");
fseek(fp, 0L, SEEK_END);
size = ftell(fp);
fseek(fp, 0L, SEEK_SET);
fread (buf,1,size,fp);
/* add the code to count number of spaces in buf[] */
int i =0;
int acum=0;
for (i=0; i<size; i++){
if (buf[i] == ' ')
acum++;
}
printf("Counter is %d \n", acum);
clock_t end = clock();
double run_time = (double)(end - begin) / CLOCKS_PER_SEC;
printf("Final time %f \n", run_time);
}
我的猜测是,考虑到问题的大小,存在分割数据,将其发送到节点,计算和减少以获得最终结果的开销。
问题在于问题非常简单,以至于并行执行无法超越其开销。
•并行任务粒度;
•沟通开销;
•流程之间的负载平衡。
感谢您的想法。