我在C中为Erastothenes筛选实施了BSP,请参阅下面的代码。
当使用./bspsieve 2 100执行时,它会提供以下输出:
“花了0.000045秒,其中proc 0为2。 23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,“
对于./bspsieve 1 100它给出相同的,即:
“./bspsieve 1 100
它花了0.000022秒,proc 0为1。
23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,“
对于./bspsieve 8 100(使用8个处理器),它会产生分段错误。 即 “./bspsieve 8 100 它花了0.000146秒为8的proc 0。 分段故障(核心转储)“ 这意味着我认为我的界限不合适了吗?
无法找到第一个素数!我找不到自己的错(对C来说真的没经验)。除此之外,我们的代码还有其他改进吗?该算法不需要很快,但可理解性和可读性方面的任何改进都是值得欢迎的。
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <mcbsp.h>
/*
Note: To compile, this file has to be in the same folder as mcbsp.h and you need the 2 following commands:
gcc -Iinclude/ -pthread -c -o bspsieve.o bspsieve.c
gcc -o bspsieve bspsieve.o lib/libmcbsp1.1.0.a -lpthread -lrt
*/
int procs;
int upperbound;
int *primes;
//SPMD function
void bspSieve(){
bsp_begin(procs);
int p = bsp_nprocs(); // p = number of procs obtained
int s = bsp_pid(); // s = proc number
float blocksize; // block size to be used, note last proc has a different size!
if( s != p-1){
blocksize = ceil(upperbound/p);
} else {
blocksize = upperbound - (p-1)*ceil(upperbound/p);
}
// Initialize start time and end time, set start time to now.
double start_time,end_time;
start_time = bsp_time();
// Create vector that has block of candidates
int *blockvector;
blockvector = (int *)malloc(blocksize*sizeof(int));
int q;
for(q = 0; q<blocksize; q++){
//List contains the integers from s*blocksize till blocksize + s*blocksize
blockvector[q] = q + s*blocksize;
}
//We neglect the first 2 'primes' in processor 0.
if(s == 0){
blockvector[0] = 0;
blockvector[1] = 0;
}
// We are using the block distribution. We assume that n is large enough to
// assure that n/p is larger than sqrt(n). This means that we will always find the
// sieving prime in the first block, and so have to broadcast from the first
// processor to the others.
long sieving_prime;
int i;
bsp_push_reg( &sieving_prime,sizeof(long) );
bsp_sync();
for(i = 2; i * i < upperbound; i++) {
//Part 1: if first processor, get the newest sieving prime, broadcast. Search for newest prime starting from i.
if(s == 0){
int findPrimeNb;
for(findPrimeNb = i; findPrimeNb < blocksize; findPrimeNb++) {
if( blockvector[findPrimeNb] != 0) {
sieving_prime = blockvector[findPrimeNb];
//broadcast
int procNb;
for(procNb = 0; procNb < p; ++procNb){
bsp_put(procNb, &sieving_prime,&sieving_prime,0,sizeof(long));
}
break;
}
}
}
bsp_sync();
//Part 2: Sieve using the sieving prime
int sievingNb;
for(sievingNb = 0; sievingNb < blocksize; sievingNb++){
//check if element is multiple of sieving prime, if so, pcross out (put to zero)
if( blockvector[sievingNb] % sieving_prime == 0){
blockvector[sievingNb] = 0;
}
}
}
//part 3: get local primes to central area
int transferNb;
long transferPrime;
for(transferNb = 0; transferNb < blocksize; transferNb++){
transferPrime = blockvector[transferNb];
primes[transferPrime] = transferPrime;
}
// take the end time.
end_time = bsp_time();
//Print amount of taken time, only processor 0 has to do this.
if( s == 0 ){
printf("It took : %.6lf seconds for proc %d out of %d. \n", end_time-start_time, bsp_pid(), bsp_nprocs());
fflush(stdout);
}
bsp_pop_reg(&sieving_prime);
bsp_end();
}
int main(int argc, char **argv){
if(argc != 3) {
printf( "Usage: %s <proc count> <upper bound> <n", argv[ 0 ] );
exit(1);
}
//retrieve parameters
procs = atoi( argv[ 1 ] );
upperbound = atoi( argv[ 2 ] );
primes = (int *)malloc(upperbound*sizeof(int));
// init and call parallel part
bsp_init(bspSieve, argc, argv);
bspSieve();
//Print all non zeros of candidates, these are the primes.
// Primes only go to p*p <= n
int i;
for(i = 0; i < upperbound; i++) {
if(primes[i] > 0) {
printf("%d, ",primes[i]);
fflush(stdout);
}
}
return 0;
}
答案 0 :(得分:1)
麻烦可能来自
blockvector[q] = q + s*blocksize;
只要blocksize
在所有进程中等于ceil(upperbound/p)
,就没有问题。因为1和2的除数是100,所以你的程序效果很好。
正如您在代码中所写的那样,情况并非总是如此......在调用./bspsieve 8 100
时,最后一个流程并非如此。 blockvector中的某些值大于100,并且在prime
数组中写入时可能会出现分段错误。
纠正此行为的方法是:
blockvector[q] = q + s*ceil(upperbound/p);
(将ceil(...)
存储得更快。)
使用它之前将prime
数组归零也可能更好。
我没有检查它是否有效......试试吧!
再见,
弗朗西斯
答案 1 :(得分:1)
下面列出了一些可能的问题。请注意,如果您提供了一个独立的可编译示例(例如外部下载),那么不熟悉BSP库的人可以更轻松地为您提供帮助。另外注意到特定的库会有所帮助(假设它是MulticoreBSP)。
for (i=2; i*i<100...
,2..9)。对于这些循环,你从素数2,3,5,7,11,13,17,19开始,错误地消除了它们,将23作为第一个主要输出。upperbound/p
当两个变量都是整数时,除法结果将是一个整数,因此ceil(upperbound/p)
可能不会按照您的想法执行。例如ceil(100/8) == 12
不是13.如果您希望除法结果是浮点数,请将数字转换为float
。procs
不能均匀地划分upperbound
)。例如,在bspsieve 8 100
的情况下,您的上一个区块从 112 而不是90开始。primes[]
数组。修复这些问题'应该'修复错误的输出和崩溃。如果你仍然得到不正确的输出,我会添加自由printf()
调用,直到你可以看到代码与应该的不同之处。我也会先用1个处理器开始测试,然后一次增加一个,以确认操作正确。同时测试不同的上限。