Eratosthenes的素数比同时更快?

时间:2014-03-17 22:51:23

标签: java multithreading algorithm concurrency sieve-of-eratosthenes

我目前正在编写一个程序,该程序首先依次由Eratosthenes筛选生成素数,然后同时生成素数。该算法的并发版本应该比顺序版本更快,但在我的情况下,并发版本约为。慢10倍。我想知道在顺序解决方案中与主线程相比,我将更多工作放在我的线程上。这是我的程序(准备阅读一下!):

Primes.java

public abstract class Primes {

    byte[] bitArr;
    int maxNum;
    final int[] BITMASK = { 1, 2, 4, 8, 16, 32, 64 };
    final int[] BITMASK2 = { 255 - 1, 255 - 2, 255 - 4, 255 - 8, 
                             255 - 16, 255 - 32, 255 - 64 };

    void setAllPrime() {
        for (int i = 0; i < bitArr.length; i++) {
            bitArr[i] = (byte) 127;
        }
    }

    void crossOut(int i) {
        bitArr[i/14] = (byte) (bitArr[i/14] - BITMASK[((i/2)%7)]);
    }

    boolean isPrime(int i) {
        if(i == 2){
            return true;
        }
        if((i%2) == 0){
            return false;
        }

        return (bitArr[i/14] & BITMASK[(i%14)>>1]) != 0;

    }

    int nextPrime(int i) {
        int k;
        if ((i%2) == 0){
            k =i+1;
        }
        else {
            k = i+2;
        }
        while (!isPrime(k) && k < maxNum){
            k+=2;
        }
        return k;
    }

    void printAllPrimes() {
        for (int i = 2; i <= maxNum; i++){
            if (isPrime(i)){
                System.out.println("Prime: " + i);
            }
        }
    }
}

PrimesSeq.java

import java.util.ArrayList;

public class PrimesSeq extends Primes{

    PrimesSeq(int maxNum) {
        this.maxNum = maxNum;
        bitArr = new byte[(maxNum / 14) + 1];
        setAllPrime();
        generatePrimesByEratosthenes();
    }

    void generatePrimesByEratosthenes() {
        crossOut(1); // 1 is not a prime

        int curr = 3;
        while(curr < Math.sqrt(maxNum)){
            for(int i = curr*curr; i < maxNum; i+=2*curr){ 
                if(isPrime(i)){                // 2*curr because odd*2 = even!
                    crossOut(i);
                }
            }
            curr = nextPrime(curr);
        }
    }
}

PrimesPara.java

import java.util.ArrayList;


public class PrimesPara extends Primes {

    PrimeThread[] threads;
    int processors;
    int currentState = 0;
    //0 = Init
    //1 = Generate primes after thread #0 finish
    //2 = Factorize

    public PrimesPara(int maxNum){
        this.maxNum = maxNum;
        this.processors = Runtime.getRuntime().availableProcessors();
        bitArr = new byte[(maxNum / 14) + 1];
        setAllPrime();
        this.threads = new PrimeThread[processors*2];
        generateErastothenesConcurrently();
        //printAllPrimes();
    }

    public void generateErastothenesConcurrently(){
        int[] starts = generateThreadIndexes();

        for(int i = 0; i < threads.length; i++){
            if(i != threads.length-1){
                threads[i] = new PrimeThread(starts[i], starts[i+1]-1, i);
            } else {
                threads[i] = new PrimeThread(starts[i], maxNum, i);
            }
        }

        //Start generating the first primes
        crossOut(1);
        Thread th = new Thread(threads[0]);
        th.start();
        try {
            th.join();
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        currentState = 1;

        //Start generating the rest of the primes
        Thread[] thrs = new Thread[threads.length];
        for(int i = 0; i < thrs.length; i++){
            thrs[i] = new Thread(threads[i]);
            thrs[i].start();
        }
        for(int i = 0; i < thrs.length; i++){
            try {
                thrs[i].join();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
        currentState = 2;
    }

    private int[] generateThreadIndexes(){
        int[] indexes = new int[processors*2];

        for(int i = 0; i < indexes.length; i++){
            indexes[i] = (i*((maxNum/(processors*2))));
        }

        indexes[indexes.length-1]++;

        return indexes;
    }

    public class PrimeThread implements Runnable {

        int start;
        int end;
        int thridx;

        public PrimeThread(int start, int end, int thridx){
            this.start = start;
            this.end = end;
            this.thridx = thridx;
        }

        public void run() {
            switch(currentState){
            case 0:
                generateSqrtPrimes();
                break;
            case 1:
                generateMyPrimes();
                break;
            case 2:
                break;
            }           
        }

        private void generateSqrtPrimes(){
            int curr = 3;           
            while(curr < Math.sqrt(maxNum)+1){
                for(int i = curr*curr; i < Math.sqrt(maxNum)+1; i+=2*curr){ 
                    if(isPrime(i)){                  // 2*curr because odd*2 = even!
                        crossOut(i);
                    }
                }
                curr = nextPrime(curr);
            }
        }

        private void generateMyPrimes(){
            int curr = start>(int)Math.sqrt(maxNum)?start:(int)Math.sqrt(maxNum);

            while(curr < end){
                for(int i = 3; i < Math.sqrt(maxNum)+1; i = nextPrime(i)){
                    if((curr%i) == 0){
                        if(isPrime(curr)){
                            crossOut(curr);
                        }
                    }
                }
                curr = nextPrime(curr);
            }
        }
    }
}

如果有人能告诉我并发计划的瓶颈在哪里,我会非常高兴。提前谢谢!

1 个答案:

答案 0 :(得分:3)

我不是JAVA编码器所以我坚持使用C ++。这也不是你问题的直接答案(对不起,但是我无法调试JAVA)把它作为一些指针去哪个方向去检查......

  1. Eratosthenes的筛子

    可以实现并行化,但速度增益不够大。相反,我使用更多的筛选标签,每个筛选器都有自己的子分区,每个表大小是所有子除数的共同乘法。这样您只需启动一次表,然后只需在O(1)

  2. 中查看它们
  3. <强>并行化

    检查完所有的筛子后,我会使用线程对所有未使用的除数进行明显的除法测试

  4. <强> memoize的

    如果你有所有找到的素数的有效表,那么除以素数并添加所有找到的新素数

  5. 我正在使用非并行素数搜索,这对我来说足够快......

    • 您可以根据并行代码进行调整......

    [Edit1]更新代码

    //---------------------------------------------------------------------------
    int bits(DWORD p)
        {
        DWORD m=0x80000000; int b=32;
        for (;m;m>>=1,b--)
         if (p>=m) break;
        return b;
        }
    //---------------------------------------------------------------------------
    DWORD sqrt(const DWORD &x)
        {
        DWORD m,a;
        m=(bits(x)>>1);
        if (m) m=1<<m; else m=1;
        for (a=0;m;m>>=1) { a|=m; if (a*a>x) a^=m; }
        return a;
        }
    //---------------------------------------------------------------------------
    List<int> primes_i32;                   // list of precomputed primes
    const int primes_map_sz=4106301;        // max size of map for speedup search for primes max(LCM(used primes per bit)) (not >>1 because SOE is periodic at double LCM size and only odd values are stored 2/2=1)
    const int primes_map_N[8]={ 4106301,3905765,3585337,4026077,3386981,3460469,3340219,3974653 };
    const int primes_map_i0=33;             // first index of prime not used in mask
    const int primes_map_p0=137;            // biggest prime used in mask
    BYTE primes_map[primes_map_sz];         // factors map for first i0-1 primes
    bool primes_i32_alloc=false;
    int isprime(int p)
        {
        int i,j,a,b,an,im[8]; BYTE u;
        an=0;
        if (!primes_i32.num)                // init primes vars
            {
            primes_i32.allocate(1024*1024);
            primes_i32.add(  2); for (i=1;i<primes_map_sz;i++) primes_map[i]=255; primes_map[0]=254;
            primes_i32.add(  3); for (u=255-  1,j=  3,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add(  5); for (u=255-  2,j=  5,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add(  7); for (u=255-  4,j=  7,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 11); for (u=255-  8,j= 11,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 13); for (u=255- 16,j= 13,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 17); for (u=255- 32,j= 17,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 19); for (u=255- 64,j= 19,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 23); for (u=255-128,j= 23,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 29); for (u=255-  1,j=137,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 31); for (u=255-  2,j=131,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 37); for (u=255-  4,j=127,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 41); for (u=255-  8,j=113,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 43); for (u=255- 16,j= 83,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 47); for (u=255- 32,j= 61,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 53); for (u=255- 64,j=107,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 59); for (u=255-128,j=101,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 61); for (u=255-  1,j=103,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 67); for (u=255-  2,j= 67,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 71); for (u=255-  4,j= 37,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 73); for (u=255-  8,j= 41,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 79); for (u=255- 16,j= 43,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 83); for (u=255- 32,j= 47,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 89); for (u=255- 64,j= 53,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add( 97); for (u=255-128,j= 59,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add(101); for (u=255-  1,j= 97,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add(103); for (u=255-  2,j= 89,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add(107); for (u=255-  4,j=109,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add(109); for (u=255-  8,j= 79,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add(113); for (u=255- 16,j= 73,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add(127); for (u=255- 32,j= 71,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add(131); for (u=255- 64,j= 31,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            primes_i32.add(137); for (u=255-128,j= 29,i=j>>1;i<primes_map_sz;i+=j) primes_map[i]&=u;
            }
    
        if (!primes_i32_alloc)
            {
            if (p  <=1) return 0;               // ignore too small walues
            if (p&1==0) return 0;               // not prime if even
            if (p>primes_map_p0)
                {
                j=p>>1;
                i=j; if (i>=primes_map_N[0]) i%=primes_map_N[0]; if (!(primes_map[i]&  1)) return 0;
                i=j; if (i>=primes_map_N[1]) i%=primes_map_N[1]; if (!(primes_map[i]&  2)) return 0;
                i=j; if (i>=primes_map_N[2]) i%=primes_map_N[2]; if (!(primes_map[i]&  4)) return 0;
                i=j; if (i>=primes_map_N[3]) i%=primes_map_N[3]; if (!(primes_map[i]&  8)) return 0;
                i=j; if (i>=primes_map_N[4]) i%=primes_map_N[4]; if (!(primes_map[i]& 16)) return 0;
                i=j; if (i>=primes_map_N[5]) i%=primes_map_N[5]; if (!(primes_map[i]& 32)) return 0;
                i=j; if (i>=primes_map_N[6]) i%=primes_map_N[6]; if (!(primes_map[i]& 64)) return 0;
                i=j; if (i>=primes_map_N[7]) i%=primes_map_N[7]; if (!(primes_map[i]&128)) return 0;
                }
            }
    
        an=primes_i32[primes_i32.num-1];
        if (an>=p)
            {
            // linear table search
            if (p<127)  // 31st prime
                {
                if (an>=p) for (i=0;i<primes_i32.num;i++)
                    {
                    a=primes_i32[i];
                    if (a==p) return 1;
                    if (a> p) return 0;
                    }
                }
            // approximation table search
            else{
                for (j=1,i=primes_i32.num-1;j<i;j<<=1); j>>=1; if (!j) j=1;
                for (i=0;j;j>>=1)
                    {
                    i|=j;
                    if (i>=primes_i32.num) { i-=j; continue; }
                    a=primes_i32[i];
                    if (a==p) return 1;
                    if (a> p) i-=j;
                    }
                return 0;
                }
            }
        a=an; a+=2;
        for (j=a>>1,i=0;i<8;i++) im[i]=j%primes_map_N[i];
        an=(1<<((bits(p)>>1)+1))-1; if (an<=0) an=1;
        an=an+an;
        for (;a<=p;a+=2)
            {
            for (j=1,i=0;i<8;i++,j<<=1)                     // check if map is set
             if (!(primes_map[im[i]]&j)) { j=-1; break; }   // if not dont bother with division
            for (i=0;i<8;i++) { im[i]++; if (im[i]>=primes_map_N[i]) im[i]-=primes_map_N[i]; }
            if (j<0) continue;
            for (i=primes_map_i0;i<primes_i32.num;i++)
                {
                b=primes_i32[i];
                if (b>an) break;
                if ((a%b)==0) { i=-1; break; }
                }
            if (i<0) continue;
            primes_i32.add(a);
            if (a==p) return 1;
            if (a> p) return 0;
            }
        return 0;
        }
    //---------------------------------------------------------------------------
    void getprimes(int p)                       // compute and allocate primes up to p
        {
        if (!primes_i32.num) isprime(3);
        int p0=primes_i32[primes_i32.num-1];    // biggest prime computed yet
        if (p>p0+10000)                         // if too big difference use sieves to fast precompute
            {
            // T((0.3516+0.5756*log10(n))*n) -> O(n.log(n))
            // sieves N/16 bytes p=100 000 000 t=1903.031 ms
            //  ------------------------------
            //   0  1  2  3  4  5  6  7 bit
            //  ------------------------------
            //   1  3  5  7  9 11 13 15 +-> +2
            //  17 19 21 23 25 27 29 31 |
            //  33 35 37 39 41 43 45 47 V +16
            //  ------------------------------
            int N=(p|15),M=(N>>4);              // store only odd values 1,3,5,7,... each bit ...
            char *m=new char[M+1];              // m[i] ->  is 1+i+i prime? (factors map)
            int i,j,k,n;
            // init sieves
            m[0]=254; for (i=1;i<=M;i++) m[i]=255;
            for(n=sqrt(p),i=1;i<=n;)
                {
                int a=m[i>>4];
                if (int(a&  1)!=0) for(k=i+i,j=i+k;j<=N;j+=k) m[j>>4]&=255-(1<<((j>>1)&7)); i+=2;
                if (int(a&  2)!=0) for(k=i+i,j=i+k;j<=N;j+=k) m[j>>4]&=255-(1<<((j>>1)&7)); i+=2;
                if (int(a&  4)!=0) for(k=i+i,j=i+k;j<=N;j+=k) m[j>>4]&=255-(1<<((j>>1)&7)); i+=2;
                if (int(a&  8)!=0) for(k=i+i,j=i+k;j<=N;j+=k) m[j>>4]&=255-(1<<((j>>1)&7)); i+=2;
                if (int(a& 16)!=0) for(k=i+i,j=i+k;j<=N;j+=k) m[j>>4]&=255-(1<<((j>>1)&7)); i+=2;
                if (int(a& 32)!=0) for(k=i+i,j=i+k;j<=N;j+=k) m[j>>4]&=255-(1<<((j>>1)&7)); i+=2;
                if (int(a& 64)!=0) for(k=i+i,j=i+k;j<=N;j+=k) m[j>>4]&=255-(1<<((j>>1)&7)); i+=2;
                if (int(a&128)!=0) for(k=i+i,j=i+k;j<=N;j+=k) m[j>>4]&=255-(1<<((j>>1)&7)); i+=2;
                }
            // compute primes
            i=p0&0xFFFFFFF1; k=m[i>>4]; // start after last found prime in list
            if ((int(k&  1)!=0)&&(i>p0)) primes_i32.add(i); i+=2;
            if ((int(k&  2)!=0)&&(i>p0)) primes_i32.add(i); i+=2;
            if ((int(k&  4)!=0)&&(i>p0)) primes_i32.add(i); i+=2;
            if ((int(k&  8)!=0)&&(i>p0)) primes_i32.add(i); i+=2;
            if ((int(k& 16)!=0)&&(i>p0)) primes_i32.add(i); i+=2;
            if ((int(k& 32)!=0)&&(i>p0)) primes_i32.add(i); i+=2;
            if ((int(k& 64)!=0)&&(i>p0)) primes_i32.add(i); i+=2;
            if ((int(k&128)!=0)&&(i>p0)) primes_i32.add(i); i+=2;
            for(j=i>>4;j<M;i+=16,j++)   // continue with 16-blocks
                {
                k=m[j];
                if (!k) continue;
                if (int(k&  1)!=0) primes_i32.add(i   );
                if (int(k&  2)!=0) primes_i32.add(i+ 2);
                if (int(k&  4)!=0) primes_i32.add(i+ 4);
                if (int(k&  8)!=0) primes_i32.add(i+ 6);
                if (int(k& 16)!=0) primes_i32.add(i+ 8);
                if (int(k& 32)!=0) primes_i32.add(i+10);
                if (int(k& 64)!=0) primes_i32.add(i+12);
                if (int(k&128)!=0) primes_i32.add(i+14);
                }
            k=m[j]; // do the last primes
            if ((int(k&  1)!=0)&&(i<=p)) primes_i32.add(i); i+=2;
            if ((int(k&  2)!=0)&&(i<=p)) primes_i32.add(i); i+=2;
            if ((int(k&  4)!=0)&&(i<=p)) primes_i32.add(i); i+=2;
            if ((int(k&  8)!=0)&&(i<=p)) primes_i32.add(i); i+=2;
            if ((int(k& 16)!=0)&&(i<=p)) primes_i32.add(i); i+=2;
            if ((int(k& 32)!=0)&&(i<=p)) primes_i32.add(i); i+=2;
            if ((int(k& 64)!=0)&&(i<=p)) primes_i32.add(i); i+=2;
            if ((int(k&128)!=0)&&(i<=p)) primes_i32.add(i); i+=2;
            delete[] m;
            }
        else{
            bool b0=primes_i32_alloc;
            primes_i32_alloc=true;
            isprime(p);
            primes_i32_alloc=false;
            primes_i32_alloc=b0;
            }
        }
    //---------------------------------------------------------------------------
    
    • 解决了矿区代码中的一些溢出错误(筛子的周期性......)
    • 还有一些进一步的优化
    • 添加了getprimes(p)功能,可以快速将所有primes<=p添加到列表中,如果它们还没有那么
    • 在最初的1 000 000个质数(最多15 485 863)上测试了正确性
    • getprimes(15 485 863)在我设置的175.563毫秒内解决了这个问题
    • 对于粗略的

    • isprime的速度要慢一些

    • primes_i32int s的动态列表

    • primes_i32.num是列表中int的数量
    • primes_i32[i]i - int i = <0,primes_i32.num-1>
    • primes_i32.add(x)x添加到列表末尾
    • primes_i32.allocate(N)为列表中的N项预分配空间,以避免重新分配速度减慢

    <强> [注释]

    我使用这个非并行版本的欧拉问题10(所有素数的总和低于2000000)

        ----------------------------------------------------------------------------------
               Time         ID      Reference    | My solution   | Note                     
        ----------------------------------------------------------------------------------
        [  35.639 ms] Problem010. 142913828922 | 142913828922  | 64_bit
    
    • 每个筛选标签都是primes_map[]数组中的一个位片,只使用奇数值(甚至不需要记住筛子)。
    • 如果您希望找到所有素数的最大速度,请拨打isprime(max value)并阅读primes_i32[]的内容
    • 我使用一半的比特大小而不是sqrt作为速度

    希望我没有忘记在这里复制一些东西