我知道有关于如何实施轮子的一些帖子,但我真的很难看到如何使用我目前的筛子方法来实现轮子。
我在C中创建了自己的位数组,具有以下实现:
#define setBit1(Array, i) (Array[i/INT_BITS] |= (1 << i % INT_BITS))
#define getBit(Array, i) ((Array[i/INT_BITS] & (1 << i % INT_BITS)) ? 1 : 0)
#define setBit0(Array, i) (Array[i/INT_BITS] &= ~(1 << i % INT_BITS))
int * createBitArray(unsigned long long size) {
// find bytes required, round down to nearest whole byte
unsigned long long bytesRequired = size / BITS_PERBYTE;
// round up to highest multiple of 4 or 8 bytes (one int)
bytesRequired = (sizeof(int) * (bytesRequired / sizeof(int) +
((size % BITS_PERBYTE * sizeof(int) == 0) ? 0 : 1)));
// allocate array of "bits", round number of ints required up
return (int *)malloc((bytesRequired));
}
我已经使用clock()在C中完成了一些测试,并且我发现对于大于1,000,000的大型数组,即使位操作,位阵列至少也是如此比int数组快200%。它还使用了1/32的内存。
#define indexToNum(n) (2*n + 1)
#define numToIndex(n) ((n - 1) / 2)
typedef unsigned long long LONG;
// populates prime array through Sieve of Eratosthenes, taking custom
// odd keyed bit array, and the raw array length, as arguments
void findPrimes(int * primes, LONG arrLength) {
long sqrtArrLength = (long)((sqrt((2 * arrLength) + 1) - 1) / 2);
long maxMult = 0;
long integerFromIndex = 0;
for (int i = 1; i <= sqrtArrLength; i++) {
if (!getBit(primes, i)) {
integerFromIndex = indexToNum(i);
maxMult = (indexToNum(arrLength)) / integerFromIndex;
for (int j = integerFromIndex; j <= maxMult; j+= 2) {
setBit1(primes, numToIndex((integerFromIndex*j)));
}
}
}
}
我用索引i填充了位数组,表示通过(2i + 1)获得的数字。这有利于减少迭代偶数所花费的任何时间,并再次将阵列的必要内存减少一半。 2后手动添加到素数。这导致在索引和数字之间转换所花费的时间,但是对于我的测试,对于超过1,000个素数,这种方法更快。
我很难理解如何进一步优化;我减少了阵列尺寸,我只测试了sqrt(n),我开始了#34; sieving&#34;从p * p向上的素数,我已经消除了所有的平均值,并且它在C中的前100,000,000个素数中仍然花了我大约60秒。
据我所知,&#34; wheel&#34;方法要求将数字的实际整数存储在索引中。我真的坚持使用我当前的位数组实现它。
答案 0 :(得分:2)
当我修复你的实现并在我的Macbook Pro上运行它时,需要17秒来标记所有复合材料&lt; = 2 ^ 31,这非常快。
但是,您可以尝试其他一些事情。使用滚轮可能会减少一半的时间。
Euler的筛子是线性时间,如果仔细实施,但它需要一个int数组而不是一个位数组。
Atkin筛网需要线性时间,非常实用:https://en.wikipedia.org/wiki/Sieve_of_Atkin
最后我自己(这意味着我还没有在其他任何地方看到它,但我也没看过)超级有趣的筛子也需要线性时间并找到所有素数&lt; = 2 ^在6.5秒内31。谢谢你给我一个借口发帖:
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <memory.h>
#include <math.h>
#define SETBIT(mem,num) ( ((uint8_t *)mem)[(num)>>4] |= ((uint8_t)1)<<(((num)>>1)&7) )
int main(int argc, char *argv[])
{
//we'll find all primes <= this
uint32_t MAXTEST = (1L<<31)-1;
//which will be less than this many
size_t MAXPRIMES = 110000000;
//We'll find this many primes with the sieve of Eratosthenes.
//After that, we'll switch to a linear time algorithm
size_t NUMPREMARK = 48;
//Allocate a bit array for all odd numbers <= MAXTEST
size_t NBYTES = (MAXTEST>>4)+1;
uint8_t *mem = malloc(NBYTES);
memset(mem, 0, NBYTES);
//We'll store the primes in here
unsigned *primes = (unsigned *)malloc(MAXPRIMES*sizeof(unsigned));
size_t nprimes = 0;
clock_t start_t = clock();
//first premark with sieve or Eratosthenes
primes[nprimes++]=2;
for (uint32_t test=3; nprimes<NUMPREMARK; test+=2)
{
if ( mem[test>>4] & ((uint8_t)1<<((test>>1)&7)) )
{
continue;
}
primes[nprimes++]=test;
uint32_t inc=test<<1;
for(uint32_t prod=test*test; prod<=MAXTEST; prod+=inc)
{
SETBIT(mem,prod);
}
}
//Iterate through all products of the remaining primes and mark them off,
//in linear time. Products are visited in lexical order of their
//prime factorizations, with factors in descending order.
//stacks containing the current prime indexes and partial products for
//prefixes of the current product
size_t stksize=0;
size_t indexes[64];
uint32_t products[64];
for (uint32_t test=primes[NUMPREMARK-1]+2; test<=MAXTEST; test+=2)
{
if ( mem[test>>4] & ((uint8_t)1<<((test>>1)&7)) )
{
continue;
}
//found a prime! iterate through all products that start with this one
//They can only contain the same or smaller primes
//current product
uint32_t curprod = (uint32_t)test;
indexes[0] = nprimes;
products[0] = curprod;
stksize = 1;
//remember the found prime (first time through, nprimes == NUMPREMARK)
primes[nprimes++] = curprod;
//when we extend the product, we add the first non-premarked prime
uint32_t extensionPrime = primes[NUMPREMARK];
//which we can only do if the current product is at most this big
uint32_t extensionMax = MAXTEST/primes[NUMPREMARK];
while (curprod <= extensionMax)
{
//extend the product with the smallest non-premarked prime
indexes[stksize]=NUMPREMARK;
products[stksize++]=(curprod*=extensionPrime);
SETBIT(mem, curprod);
}
for (;;)
{
//Can't extend current product.
//Pop the stacks until we get to a factor we can increase while keeping
//the factors in descending order and keeping the product small enough
if (--stksize <= 0)
{
//didn't find one
break;
}
if (indexes[stksize]>=indexes[stksize-1])
{
//can't increase this factor without breaking descending order
continue;
}
uint64_t testprod=products[stksize-1];
testprod*=primes[++(indexes[stksize])];
if (testprod>MAXTEST)
{
//can't increase this factor without exceeding our array size
continue;
}
//yay! - we can increment here to the next composite product
curprod=(uint32_t)testprod;
products[stksize++] = curprod;
SETBIT(mem, curprod);
while (curprod <= extensionMax)
{
//extend the product with the smallest non-premarked prime
indexes[stksize]=NUMPREMARK;
products[stksize++]=(curprod*=extensionPrime);
SETBIT(mem, curprod);
}
}
}
clock_t end_t = clock();
printf("Found %ld primes\n", nprimes);
free(mem);
free(primes);
printf("Time: %f\n", (double)(end_t - start_t) / CLOCKS_PER_SEC);
}
请注意,我的筛子从筛子或Eratosthenes开始,比您的筛子更优化。主要区别在于我们只在位掩码数组中为奇数分配位。该部分的速度差异不显着。
答案 1 :(得分:0)
由于位操作开销,它总是会变慢。
但你可以尝试优化它。
setBit1(Array, i)
可以通过使用所有preshiftet位的常量数组来改进(我称之为ONE_BIT
)
NEW:
#define setBit1(Array, i) (Array[i/INT_BITS] |= ONE_BIT[ i % INT_BITS])
setBit0(Array, i)
相同
新:
#define setBit0(Array, i) (Array[i/INT_BITS] &= ALL_BUT_ONE_BIT[ i % INT_BITS])
INT_BITS也很可能是2的幂,所以你可以替换
i % INT_BITS
通过
i & (INT_BITS-1)
//因为您应该将INT_BITS-1
存储在常量中并使用
如果这样可以加速代码,那么必须通过分析检查每次更改。