Question

这是我每秒多次调用的函数：

static inline double calculate_scale(double n) { //n may be int or double
    return sqrt(n) - sqrt(n-1);
}

在循环中调用，如：

for(double i = 0; i < x; i++) {
    double scale = calculate_scale(i);
    ...
}

它太慢了。优化此功能以获得尽可能准确的输出的最佳方法是什么？

参数n：从1开始，几乎不受限制，但主要用于1-10范围内的小数字。它是整数（整数），但可能同时为int或double，具体取决于效果如何。

Answer 1

您可以尝试使用以下近似值替换它

sqrt(n) - sqrt(n-1) == 
(sqrt(n) - sqrt(n-1)) * (sqrt(n) + sqrt(n-1)) / (sqrt(n) + sqrt(n-1)) ==
(n - (n + 1)) / (sqrt(n) + sqrt(n-1)) ==
1 / (sqrt(n) + sqrt(n-1))

如果足够大n，则最后一个等式非常接近1 / (2 * sqrt(n))。所以你只需要拨打sqrt一次。值得注意的是，即使没有近似值，最后一个表达式在较大n的相对误差方面也更具数值稳定性。

Answer 2

首先，感谢所有建议。我做了一些研究，发现了一些有趣的实现和事实。

1。在循环中或使用预计算表

（感谢@Ulysse BN）您只需保存以前的sqrt(n)值即可优化循环。以下示例演示了用于设置预计算表的此优化。

    /**
     * Init variables
     *      i       counter
     *      x       number of cycles (size of table)
     *      sqrtI1  previous square root = sqrt(i-1)
     *      ptr     Pointer for next value
     */
    double i, x = sizeof(precomputed_table) / sizeof(double);
    double sqrtI1 = 0;

    double* ptr = (double*) precomputed_table;

    /**
     * Optimized calculation
     * In short:
     *      scale = sqrt(i) - sqrt(i-1)
     */
    for(i = 1; i <= x; i++) {
        double sqrtI = sqrt(i);
        double scale = sqrtI - sqrtI1; 
        *ptr++ = scale;
        sqrtI1 = sqrtI;
    }

使用预先计算的表是可能是最快的方法，但它的缺点可能是它的大小有限。

static inline double calculate_scale(int n) {
    return precomputed_table[n-1];
}

2。使用反平方根

逼近BIG数

必需的反向（倒数）平方根函数rsqrt

此方法具有大数字的最准确结果。数字较小有错误：

1    2     3      10       100     1000
0.29 0.006 0.0016 0.000056 1.58e-7 4.95e-10

以下是我用来计算上述结果的JS代码：

function sqrt(x) { return Math.sqrt(x); } function d(x) { return (sqrt(x)-sqrt(x-1))-(0.5/sqrt(x-0.5));} console.log(d(1), d(2), d(3), d(10), d(100), d(1000));

您还可以在单个图表中看到与两个sqrt版本相比的准确性：https://www.google.com/search?q=(sqrt(x)-sqrt(x-1))-(0.5%2Fsqrt(x-0.5))

用法：

static inline double calculate_scale(double n) {
    //Same as: 0.5 / sqrt(n-0.5)
    //but lot faster
    return 0.5 * rsqrt(n-0.5);
}

在一些较旧的cpus（缓慢或没有硬件平方根）上，使用来自Quake的float和快速反平方根可能会更快：

static inline float calculate_scale(float n) {
    return 0.5 * Q_rsqrt(n-0.5);
}

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y;                       // evil floating point bit level hacking
    i  = 0x5f3759df - ( i >> 1 );               // what the fuck? 
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//  y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

    return y;
}

有关实施的详细信息，请参阅https://en.wikipedia.org/wiki/Fast_inverse_square_root和http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf。不建议在modern cpus with hardware reciprocal square root上使用。

并不总是解决方案：0.5 / sqrt（n-0.5）

请注意，在某些处理器上（例如ARM Cortex A9，Intel Core2）除硬件平方根时，几乎所以最好使用2平方根sqrt(n) - sqrt(n-1) OR的原始函数倒数平方根，乘以0.5 * rsqrt(n-0.5)（如果存在）。

3。使用具有后备的预计算表

这种方法是前两种解决方案之间的良好折衷。它具有良好的准确性和性能。

static inline double calculate_scale(double n) { if(n <= sizeof_precomputed_table) { int nIndex = (int) n; return precomputed_table[nIndex-1]; } //Multiply + Inverse Square root return 0.5 * rsqrt(n-0.5); //OR return sqrt(n) - sqrt(n-1); }

在我的情况下，我需要非常准确的数字，所以我的预先计算的表格大小是2048.

欢迎任何反馈。

Answer 3

您声明#!/bin/sh IFS=' ' for file in `ls -1 *.srt`; do newname=`echo "$file" | sed 's/^.*$[0-9]\+$x$[0-9]\+$.*$/S0\1E\2.srt/'` mv "$file" "$newname" done主要是小于10的数字。您可以将预先计算的表用于小于10的数字，或者因为它便宜而使用甚至更多，并且在数字较大的情况下回退到实际计算。

代码看起来像：

优化sqrt（n） - sqrt（n-1）

3 个答案:

1。在循环中或使用预计算表

2。使用反平方根

3。使用具有后备的预计算表