使用log ₂（x）

Question

我目前正在寻找一个非常快速的整数平方根逼近，其中floor(sqrt(x)) <= veryFastIntegerSquareRoot(x) <= x

平方根程序用于计算素数，如果只检查低于或等于sqrt(x)的值作为x的除数，则该素数会大大加快。

我目前所拥有的是this function from Wikipedia，调整了一小部分以使用64位整数。

因为我没有其他功能可以比较（或者更确切地说，该功能对于我的目的来说太精确了，并且可能需要更多时间，而不是高于实际结果。）

Answer 1

Loopfree / jumpfree（好吧：差不多 ;-) Newton-Raphson：

/* static will allow inlining */
static unsigned usqrt4(unsigned val) {
    unsigned a, b;

    if (val < 2) return val; /* avoid div/0 */

    a = 1255;       /* starting point is relatively unimportant */

    b = val / a; a = (a+b) /2;
    b = val / a; a = (a+b) /2;
    b = val / a; a = (a+b) /2;
    b = val / a; a = (a+b) /2;

    return a;
}

对于64位整数，您需要更多步骤（我的猜测：6）

Answer 2

精确计算`floor(sqrt(x))`

这是我基于bit-guessing approach proposed on Wikipedia的解决方案。不幸的是，维基百科上提供的伪代码存在一些错误，因此我不得不进行一些调整：

unsigned char bit_width(unsigned long long x) {
    return x == 0 ? 1 : 64 - __builtin_clzll(x);
}

// implementation for all unsigned integer types
unsigned sqrt(const unsigned n) {
    unsigned char shift = bit_width(n);
    shift += shift & 1; // round up to next multiple of 2

    unsigned result = 0;

    do {
        shift -= 2;
        result <<= 1; // leftshift the result to make the next guess
        result |= 1;  // guess that the next bit is 1
        result ^= result * result > (n >> shift); // revert if guess too high
    } while (shift != 0);

    return result;
}

可以在恒定时间内对

bit_width进行求值，循环最多重复ceil(bit_width / 2)次。因此，即使对于64位整数，这也将是基本算术和按位运算的最坏32次迭代。

与到目前为止提出的所有其他答案不同，这实际上为您提供了最佳的近似值，即floor(sqrt(x))。对于任何x ²，这将精确返回x。

使用log ₂（x）

进行猜测

如果这对于您仍然太慢，您可以仅根据二进制对数进行猜测。基本思想是，我们可以使用2 ^{x / 2}计算任意数量2 ^x的sqrt。 x/2可能有余数，因此我们不能总是精确地计算出这个数值，但是可以计算出一个上限和下限。

例如：

我们得到25
计算floor(log_2(25)) = 4
计算ceil(log_2(25)) = 5
下限：pow(2, floor(4 / 2)) = 4
上限：pow(2, ceil(5 / 2)) = 8

实际上是实际的sqrt(25) = 5。我们找到了sqrt(16) >= 4和sqrt(32) <= 8。这意味着：

4 <= sqrt(16) <= sqrt(25) <= sqrt(32) <= 8
            4 <= sqrt(25) <= 8

这是我们实现这些猜测的方式，我们将其称为sqrt_lo和sqrt_hi。

// this function computes a lower bound
unsigned sqrt_lo(const unsigned n) noexcept
{
    unsigned log2floor = bit_width(n) - 1;
    return (unsigned) (n != 0) << (log2floor >> 1);
}

// this function computes an upper bound
unsigned sqrt_hi(const unsigned n)
{
    bool isnt_pow2 = ((n - 1) & n) != 0; // check if n is a power of 2
    unsigned log2ceil = bit_width(n) - 1 + isnt_pow2;
    log2ceil += log2ceil & 1; // round up to multiple of 2
    return (unsigned) (n != 0) << (log2ceil >> 1);
}

对于这两个函数，以下语句始终为真：

sqrt_lo(x) <= floor(sqrt(x)) <= sqrt(x) <= sqrt_hi(x) <= x

请注意，如果我们假设输入从不为零，那么(unsigned) (n != 0)可以简化为1，并且该语句仍然为真。

可以使用硬件O(1)指令在__builtin_clzll中评估这些功能。它们仅给出数字2 ^2x的精确结果，因此256，64，16等

Answer 3

这个版本可以更快，因为DIV很慢而且数量很少（Val <20k）此版本将误差降低至小于5％。在ARM M0上测试（没有DIV硬件加速）

input

Answer 4

在现代PC硬件上，使用浮点算术计算n的平方根可能比任何类型的快速整数数学更快。

但请注意，可能根本不需要：您可以改为对候选者进行平方并在平方超过n的值时停止。无论如何，占主导地位的行动是分裂：

#define PBITS32  ((1<<2) | (1<<3) | (1<<5) | (1<<7) | (1<<11) | (1<<13) | \
                  (1UL<<17) | (1UL<<19) | (1UL<<23) | (1UL<<29) | (1UL<<31))

int isprime(unsigned int n) {
    if (n < 32)
        return (PBITS32 >> n) & 1;
    if ((n & 1) == 0)
        return 0;
    for (unsigned int p = 3; p * p <= n; p += 2) {
        if (n % p == 0)
            return 0;
    }
    return 1;
}

快速整数平方根近似

4 个答案:

精确计算`floor(sqrt(x))`

使用log ₂（x）

快速整数平方根近似

4 个答案:

精确计算floor(sqrt(x))

使用log 2 （x）

精确计算`floor(sqrt(x))`

使用log ₂（x）