以下代码计算所有素数,直到50,000,000并正确地100%工作。问题是它需要太长时间。有32个处理器,我有大约42秒。我的同伴有16秒,我似乎无法找到我的代码滞后的地方。请留下任何建议:)
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
const int n=50000000; //number until which primes are counted (inclusive)
int main (int argc, char *argv[]) {
long local_count=0, global_count=0, //variables used to keep count of the primes
start, finish, //highest and lowest possible primes for this processor
i;
int rank, //processor id
p, //number of processes
size, proc0_size; //amount of numbers to check on any processor and proc 0
double runtime; //variable used to keep track of total elapsed time
//initialize the MPI execution environment
MPI_Init (NULL, NULL);
//determine the rank of the calling processor in the communicator
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
//determine the size of communicator
MPI_Comm_size (MPI_COMM_WORLD, &p);
//Ensures that every processor enters code at around the same time
MPI_Barrier(MPI_COMM_WORLD);
//Start the timer
runtime = -MPI_Wtime();
//compute the range and size to be used for this processor
start = 2 + rank*(n-1)/p;
finish = 1 + (rank+1)*(n-1)/p;
size = finish - start + 1;
//determine the size of processor 0
proc0_size = (n-1)/p;
//in the case where there are too many processors for the amount of numbers
//to check...
if ((2 + proc0_size) < (long) sqrt((double) n)) {
if (rank == 0)
printf ("Too many processors to calculate the number of primes up to %d\n", n);
MPI_Finalize();
exit(1);
}
int j;
//check every number in the range of this processor
for (j=start; j<=finish; j++){
//if the number is not composite, ie prime, increment the local counter
if (isComposite(j)==0){
local_count++;
}
}
//MPI_Reduce used to combine all local counts
MPI_Reduce (&local_count, &global_count, 1, MPI_LONG_LONG, MPI_SUM, 0, MPI_COMM_WORLD);
//adjust the total elapsed runtime
runtime += MPI_Wtime();
printf("Process %d finished\n", rank);
// print the results from process ranked root
if (rank == 0) {
printf ("There are %ld primes less than or equal to %d\n", global_count, n);
printf ("Total elapsed time: %f seconds\n", runtime);
}
//terminate the execution environment
MPI_Finalize ();
return 0;
}
//function to check if a number is composite
//returns 0 if it is not composite (prime) and returns a 1 if it is composite
int isComposite (int num){
int retval=0;
if(num==1)
retval = 1;
else if(num%2 == 0 && num!=2)
retval = 1;
if(retval != 1) {
int j;
for(j=3; j<num; j+=2) {
if(num%j == 0 ){
retval = 1;
break;
}
if(j*j>num) break;
}
}
if(retval == 0)
return 0;
else
return 1;
}
答案 0 :(得分:0)
我编译了你的测试并在本地机器上运行它。
我得到了
3001134 primes less than or equal to 50000000
Total elapsed time:
约35-37秒。我的PC有4核Q6600核心,2.4 GHz,64位ubuntu。测试编译为mpicc t.c -o t -lm -O3
。
只有两个核心 - 时间是66秒。
我注意到:排名较高的流程需要做更多的工作,并且他们的计算时间晚于排名较低的流程。因此,当您使用批次流程时,总执行时间由最后一个具有最高级别的流程定义。
尝试重新分配工作(为较高级别的流程提供较少的细分)并优化您的isComposite
功能。正如我在使用perf record
进行性能分析的结果中看到的那样,num%j == 0
行需要花费很多时间(大约80%)。
最好准备小素数列表并将它们作为第一步筛选,然后再转换为更昂贵的筛分。
同样,筛选意味着不会在start..finish上迭代j
并测试每个j
,而是从start..finish创建数组,然后使用i
进行迭代 - 每个数字减去{ {1}},并将sqrt(finish)
,array[2*i]
,array[3*i]
,array[4*i]
等标记为复合(您可以使用加法而不是乘法)。然后,您将从数组中计算未标记的元素,以从间隔中获取素数。 (请检查来自http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes的动画)。您的代码是最慢的http://en.wikipedia.org/wiki/Trial_division