Question

很明显，OP已经得到了他们的答案，in the comments，问题现在已经解决了。

我编写了一个使用pthreads执行的素数程序（eratosthenes筛）。

这是我的第一个多线程程序，我不知道为什么我的程序大约需要3分钟。时间执行。 太多时间了！

有人能告诉我到底出错了吗？

#include<iostream>
#include<cstring>
#include<pthread.h>
#include<time.h>

using namespace std;

//set limits
#define  LIMIT   100000001
#define THREAD_LIMIT   8

//declare buffers
bool num[LIMIT];

unsigned long long num_of_prime = 1; // 2 is counted as prime initially 
unsigned long long sum_prime = 2;    // 2 is counted in sum of primes

void *search(void *);

int main()
{
    clock_t start_time = clock(); // start clock stamp

    pthread_t thread[THREAD_LIMIT];
    int thread_val=-1,j=-1;
    unsigned long long i=3;
    bool *max_prime[10];    // stores max. 10 prime numbers

    memset(num,0,LIMIT);    // initialize buffer with 0 

    while(i<LIMIT)
    {
        if(num[i]==0)
        {
            num_of_prime++;
            sum_prime +=i;
            j = ++j%10;
            max_prime[j]=num+i;
            thread_val=++thread_val%THREAD_LIMIT; 
            pthread_join(thread[thread_val],NULL);  // wait till the current thread ends
            pthread_create(&thread[thread_val],NULL,search,(void *)i); // fork thread function to flag composite numbers
        }   
        i+=2;   // only odd numbers
    }

    // end all threads
    for(i=0;i<THREAD_LIMIT;i++)
    {
        pthread_join(thread[i],NULL); 
    }

    cout<<"Execution time: "<<((double)(clock() - start_time))/CLOCKS_PER_SEC<<"\n";
    cout<<"Number of Primes: "<<num_of_prime<<"\n";
    cout<<"Sum of Primes: "<<sum_prime<<"\n";
    cout<<"List of 10 Max. Primes: "<<"\n";
    for(i=0;i<10;i++)
    {
        j=++j%10;
        cout<<(max_prime[j]-num)<<"\n";
    }
    return 0;
}

void *search(void *n)
{
    unsigned long long jump = (unsigned long long int)n;
    unsigned long long position = jump*jump; // Jump to N*N th comppsite number
    bool *posn = num;

    jump<<=1; 
    while(position<LIMIT)
    {

        (*(posn+position))?(position+=jump):(*(posn+position)=1,position+=jump);

    } 
    return NULL;
}

约束上：只能分叉8个线程。

N：10 ^ 8

如何提高此代码的效率（特别是在分叉和加入线程时）？

Answer 1

我的经验是，在这个问题上抛出一些线索可以加快速度，但令人失望的是，对于大到N的素数来说，这个问题很快。

我尝试将筛分成块，每个线程一个。一个线程生成一个最大为sqrt（N）的素数列表，然后所有线程都会在筛子的一部分处收缩，从而淘汰多个素数。我们的想法是尽可能减少螺纹之间的相互作用 - 它们都会独立地压在筛子的一部分上。

您的代码似乎开始一个新的线程，以找出每个找到的素数的倍数。开始/停止那么多线程的开销让我感到沮丧！如果我能看到你如何避免线程相互绊倒，我该死的 - 但我认为他们不会？

FWIW，对于高达10 ^ 8的素数，我管理：

无螺纹：经过0.160秒，用户0.140秒
5个主题：0.040秒，0.130秒用户。

在一台相对适中的x86_64机器上。

对于10 ^ 10：

无螺纹：经过39.260秒，37.910秒用户
5个主题：经过23.680秒，用户为110.120秒。

令人深感失望。我认为问题在于缓存正在被淹没......代码依次处理每个素数并且将其所有倍数都清零，因此从一个块的一端扫到另一个，然后回到开头。实际上，对于所有素数来说，比如说512K的筛子可能会更好，然后重复。

如何改进多线程程序的分叉/加入？

1 个答案: