Question

计算一组数字ARR[0]/N+ARR[1]/N...+ARR[N-1]/N或(ARR[0]+ARR[1]...+ARR[N-1])/N的平均值的更准确方法是什么？（ARR是一组数字，N是该数字中的数字的数量）

考虑我有一组数字，每个数字的范围从0.0到1.0（它们是双\浮点数），有数千甚至数百万。

我对新的方法持开放态度，例如递归平均值（平均双胞胎进入数组，然后再次平均直到它输出单细胞数组）。

Answer 1

如果接近于零的值非常接近于零，则在求和时会出现舍入（可能是向上或向下舍入误差）的问题，或者如果求和一大组数字，则会出现任何数字范围。解决这个问题的一种方法是使用一个求和函数，该函数只添加具有相同指数的数字（直到你调用getsum（）来获得总和，它保持指数尽可能接近）。示例C ++类来执行此操作（注释代码是使用Visual Studio编译的，在uint64_t可用之前编写）。

//  SUM contains an array of 2048 IEEE 754 doubles, indexed by exponent,
//  used to minimize rounding / truncation issues when doing
//  a large number of summations

class SUM{
    double asum[2048];
public:
    SUM(){for(int i = 0; i < 2048; i++)asum[i] = 0.;}
    void clear(){for(int i = 0; i < 2048; i++)asum[i] = 0.;}
//  getsum returns the current sum of the array
    double getsum(){double d = 0.; for(int i = 0; i < 2048; i++)d += asum[i];
                    return(d);}
    void addnum(double);
};

void SUM::addnum(double d)      // add a number into the array
{
size_t i;

    while(1){
//      i = exponent of d
        i = ((size_t)((*(unsigned long long *)&d)>>52))&0x7ff;
        if(i == 0x7ff){         // max exponent, could be overflow
            asum[i] += d;
            return;
        }
        if(asum[i] == 0.){      // if empty slot store d
            asum[i] = d;
            return;
        }
        d += asum[i];           // else add slot to d, clear slot
        asum[i] = 0.;           // and continue until empty slot
    }
}

使用sum类的示例程序：

#include <iostream>
#include <iomanip>
using namespace std;

static SUM sum;

int main()
{
double dsum = 0.;
double d = 1./5.;
unsigned long i;

    for(i = 0; i < 0xffffffffUL; i++){
        sum.addnum(d);
        dsum += d;
    }
    cout << "dsum             = " << setprecision(16) << dsum << endl;
    cout << "sum.getsum()     = " << setprecision(16) << sum.getsum() << endl;
    cout << "0xffffffff * 1/5 = " << setprecision(16) << d * (double)0xffffffffUL << endl;

    return(0);
}

Answer 2

(ARR[0]+ARR[1]...+ARR[N-1])/N更快，更准确，因为您省略了N的无用分区，这两个分区都会减慢进程并在计算中添加错误。

Answer 3

如果你有一堆浮点数，最准确的方法就是这样：

template<class T> T mean(T* arr, size_t N) {
    std::sort(+arr, arr+N, [](T a, T b){return std::abs(a) < std::abs(b);});
    T r = 0;
    for(size_t n = 0; n < N; n++)
        r += arr[n];
    return r / N;
}

重点：

首先添加最小幅度的数字以保留有效数字。
只有一个师，以减少那里的舍入错误。

但是，中间数额可能会变得太大。

什么是更准确的平均方式，ARR [0] / N + ARR [1] / N ... + ARR [N-1] / N或（ARR [0] + ARR [1] ... + ARR [ N-1]）/ N双倍？

3 个答案: