Question

我正在写一个函数来查找一个数组的平均值，该数组中的大多数是如果一次添加全部就会溢出的数字。

它的工作方式是创建一个子数组（在我的代码中b），它是输入（在我的代码中{a）数组大小（{在我的代码中ar_size）的一半，然后将输入数组a[i+0] and a[i+1]中2个值的平均值，且与b[j]没有重叠。

一旦遍历整个输入数组，它将通过返回子数组和输入数组的大小来重新运行该函数，直到大小等于2，然后通过返回b[2]两个值的平均值来结束递归。

请原谅j的重用。

数组的大小也是2的幂。

uint64_t* array_average(uint64_t* a, const int ar_size)
{
    uint64_t* b = new uint64_t[ar_size / 2];

    uint64_t* j = new uint64_t;

    if (ar_size == 2)
    {
     *j = (a[0] / 2) + (a[1] / 2) + ((a[0] % 2 + a[1] % 2) / 2);

     return j;
    }

    for (int i = 0; i < ar_size; i += 2)
    {
        b[*j] = (a[i + 0] / 2) + (a[i + 1] / 2) + ((a[i + 0] % 2 + a[i + 1] % 2) / 2);

        ++*j;
    }
    delete j;
    return array_average(b, ar_size / 2);
}

在处理可能导致溢出的数字时，还有没有人有更好的平均方法？

这是修订版：

uint64_t* tools::array_average(uint64_t* a, const int ar_size)
{
    uint64_t* b = new uint64_t[ar_size];
    uint64_t* c = new uint64_t[ar_size / 2];

    int j;
    j = 0;

    for (int i = 0; i < ar_size; ++i)
    {
        b[i] = a[i];
    }

    if (runs > 0) //This is so i do not delete the original input array I.E not done with it
    {
        delete[] a;
    }

    if (ar_size == 2)
    {
        uint64_t* y = new uint64_t;

        runs = 0;

        *y = (b[0] / 2) + (b[1] / 2) + ((b[0] % 2 + b[1] % 2) / 2); 

        delete[] b;

        return y;
    }

    for (int i = 0; i < ar_size; i += 2)
    {
        c[j] = (b[i + 0] / 2) + (b[i + 1] / 2) + ((b[i + 0] % 2 + b[i + 1] % 2) / 2);

        ++j;
    }

    delete[] b;

    ++runs;

    return array_average(c, ar_size / 2);

Answer 1

首先，请注意您的平均值不是实际平均值，因为您确实丢掉了一半。您的算法在一个介于0和1之间交替的数组上的结果将为0，因为0/2 + 1/2 +（0％2 + 1％2）/ 2 =0。想以此开始，因为那是算法的严重缺陷。

还请注意，如果原始大小不是2的幂，则某些数据将获得更高的权重。

除此之外，请考虑以下算法：复制数据。在数据只剩下一个条目之前，将单元格0和1的平均值放在单元格0中，将2和3的平均值放在单元格1中，将4和5的平均值放在2中，依此类推。在执行每个此类步骤之后，请收缩数据。

代码：

uint64_t average(std::vector<uint64_t> data)
{
    while(data.size() != 1)
    {
        for(size_t i=0; i<data.size()/2; i++)
        {
            data[i] = data[2*i]/2 + data[2*i+1]/2 + /* modular stuff */;
        }
        data.resize(data.size()/2 + data.size()%2); //last part is required if the size is not an even number
    }
    return data[0];
}

顺便说一下，在这里使用适当的容器也可以避免内存泄漏。

请注意，此代码仍然具有我所谈到的弱点。您可以通过收集一半来扩展它，即，如果模块化部分为1，则增加一个变量，而当变量为2时，则在某个单元格中添加一个。

编辑：如果输入HAS是原始数组（例如，因为您是从某些外部来源收到的），请使用以下命令：

uint64_t average(uint64_t* array, const int array_size)
{
    std::vector<uint64_t> data(array, array + array_size);

    (rest of the code is identical)

编辑：上面收集一半的代码：

inline uint64_t average(const uint64_t& a, const uint64_t& b, uint8_t& left_halves)
{
    uint64_t value = a/2 + b/2 + (a%2 + b%2)/2;
    if((a%2 + b%2)%2 == 1)
    {
        left_halves += 1;
    }
    if(left_halves == 2)
    {
        value += 1;
        left_halves = 0;
    }
    return value;
}

uint64_t average(std::vector<uint64_t> data)
{
    if(data.size() == 0) return 0;

    uint8_t left_halves = 0;
    while(data.size() != 1)
    {
        for(size_t i=0; i<data.size()/2; i++)
        {
            data[i] = average(data[2*i], data[2*i+1], left_halves);
        }
        data.resize(data.size()/2 + data.size()%2); //last part is required if the size is not an even number
    }
    return data[0];
}

如果大小不是2的幂，仍然具有增加细胞重量的缺点。

Answer 2

您可以使用：

constexpr bool is_power_of_2(uint64_t n)
{
    return n && !(n & (n - 1));
}

uint64_t array_average(std::vector<uint64_t> v)
{
    if (!is_power_of_2(v.size())) {
        throw std::runtime_error("invalid size");
    }
    uint64_t remainder = 0;
    while (v.size() != 1) {
        for (int i = 0; i != v.size(); i += 2) {
            remainder += (a[i] % 2 + a[i + 1] % 2);
            b[i / 2] = a[i] / 2 + a[i + 1] / 2;
            if (remainder >= 2 && b[i / 2] < -(remainder / 2)) {
                b[i / 2] += remainder / 2;
                remainder %= 2;
            }
        }
        v.resize(v.size() / 2);
    }
    return v[0] + remainder / 2;
}

Answer 3

由于在stl中已经存在可以为您完成此操作的容器，函数和算法，因此实际上应该没有太多要转换的内容。没有任何功能，请检查以下简短程序：

#include <vector>
#include <numeric>
#include <iostream>
#include <exception>

int main() {
    try {

        std::vector<uint64_t> values{ 1,2,3,4,5,6,7,8,9,10,11,12 };
        int total = std::accumulate( values.begin(), values.end(), 0 );
        uint64_t average = static_cast<uint64_t>( total ) / values.size();
        std::cout << average << '\n';

    } catch( const std::runtime_error& e ) {
        std::cerr << e.what() << '\n';
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

在运行windows 7 ultimate 64bit的机器visual studio 2017 CE上，其语言版本设置为最新的c++17或更高。这确实给了我一个编译器警告！由于转换和可能的数据丢失而产生了Warning: C4244。但是，没有编译器错误，它确实可以运行并给出预期的结果。由于6被截断，因此此处的输出为integer division。如果我将上面的这些代码行更改为此：

double total = std::accumulate( values.begin(), values.end(),
                            static_cast<double>( 0 ) );
double average = total / values.size();

它通过添加static_cast来修复上面的编译器警告，并确保足够打印出6.5，这是实际值。

这很好，因为向量已经用值初始化了；但是，可能并非总是如此，因此让我们将其移入将采用任意数组的函数中。看起来像这样：

uint64_t array_average( std::vector<uint64_t>& values ) {
    // Prevent Division by 0 and early return 
    // as to not call `std::accumulate`
    if ( !values.empty() ) {
        // check if only 1 entry if so just return it
        if ( values.size() == 1 ) {
            return values[0];
        } else { // otherwise do the calculation.
            return std::accumulate( values.begin(), values.end(),
                                    static_cast<uint64_t>( 0 ) ) / values.size();
        } 
    } 
    // Empty Container 
    throw std::runtime_error( "Can not take average of an empty container" );
}

此功能很好，所有功能，我们可以通过改进通用性使其可以与任何arithmetic type一起使用来改善性能！

template<typename T>
T array_average( std::vector<T>& values ) {
    if( std::is_arithmetic<T>::value ) {
        if( !values.empty() ) {
            if( values.size() == 1 ) {
                return values[0];
            } else { 
                return std::accumulate( values.begin(), values.end(), static_cast<T>( 0 ) ) / values.size();
            }
        } else {
            throw std::runtime_error( "Can not take average of an empty container" ); 
        }
    } else {
        throw std::runtime_error( "T is not of an arithmetic type" );
    }
}

乍一看，这看起来还不错。如果将其与arithmetic类型一起使用，它将编译并运行。但是，如果我们将其与非类型一起使用，则将无法编译。例如：

#include <vector>
#include <numeric>
#include <iostream>
#include <exception>
#include <type_traits>

class Fruit {
protected:
     std::string name_;
public:
    std::string operator()() const {
        return name_;
    }
    std::string name() const { return name_; }

    Fruit operator+( const Fruit& other ) {
        this->name_ += " " + other.name();
        return *this;
    }
};

class Apple : public Fruit {
public:
    Apple() { this->name_ = "Apple"; }

};

class Banana : public Fruit {
public:
    Banana() { this->name_ = "Banana"; }
};

class Pear : public Fruit {
public:
    Pear() { this->name_ = "Pear"; }
};

std::ostream& operator<<( std::ostream& os, const Fruit& fruit ) {
    os << fruit.name() << " ";
    return os;
}

template<typename T>
T array_average( std::vector<T>& values ); // Using the definition above

int main() {
    try {
        std::vector<uint64_t> values { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
        std::vector<double> values2 { 2.0, 3.5, 4.5, 6.7, 8.9 };
        std::vector<Fruit> fruits { Apple(), Banana(), Pear() };

        std::cout << array_average( values ) << '\n';  // compiles runs and prints 6
        std::cout << array_average( values2 ) << '\n'; // compiles runs and prints 5.12
        std::cout << array_average( fruits ) << '\n'; // fails to compile.

    } catch( const std::runtime_error& e ) {
        std::cerr << e.what() << '\n';
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

这无法编译，因为static_cast无法将int转换为T的{{1}}编译器错误T = Fruit

如果您的编译器支持，我们可以通过在函数模板中更改一行代码来解决此问题：

我们可以将MSVC更改为C2440，现在我们的功能将如下所示：

if( std::is_arithmetic<T>::value )

您可以运行上面的同一程序，即使您使用的是非算术类型，也可以完全编译。

if constexpr( std::is_arithmetic<T>::value )

但是，当您运行此代码时，它将生成运行时错误，并且取决于您的IDE和调试器的设置方式，您可能需要在template<typename T> T array_average( const std::vector<T>& values ) { if constexpr( std::is_arithmetic<T>::value ) { if( !values.empty() ) { if( values.size() == 1 ) { return values[0]; } else { return std::accumulate( values.begin(), values.end(), static_cast<T>( 0 ) ) / values.size(); } } else { throw std::runtime_error( "Can not take average of an empty container" ); } } else { throw std::runtime_error( "T is not of an arithmetic type" ); } }语句的int main() { //.... std::cout << array_average( fruits ) << '\n'; // Now compiles //... }语句中放置一个断点。看到打印在屏幕上的消息，否则应用程序可能会退出而根本没有任何通知。

如果您不希望出现运行时错误，则可以通过使用static_assert而不是引发运行时错误来替代并产生编译器时间错误。这可能是一个方便的小功能，但并非100％没有一些小的限制和陷阱，但是要查找有关此功能的更多信息，可以检查我在编写对此功能的实现时所问的问题：可以在here中找到，并且您可以在此处阅读注释，从而可以更深入地了解此功能提供的某些限制。

此功能当前的局限性之一是：假设我们有一个容器，其中包含一堆复数catch，return EXIT_FAILURE，(3i + 2)，但您仍然可以将它们的平均值作为有效值，但是上述函数在当前状态下不会将其视为算术运算。

要解决此问题，可以做的是：不用使用(4i - 6)，您可以编写自己的(7i + 3)和std::is_arithmetic<t>，此函数应接受。我将把这一部分留给您练习。

如您所见，标准库已经为我们完成了大部分工作。我们使用policy并除以容器大小，然后完成了所有工作，其余时间是确保它可以接受适当的类型（如果要保证线程安全和/或异常安全等）。

最后，我们不必担心数组上繁琐的for循环，并确保循环不会超出数组的大小。我们不必调用traits，而不必担心何时何地调用accumulate从而不会造成任何内存泄漏。 ASFAIK我认为new不会在支持容器上溢出，但不要在此引用我。它可能取决于容器中的delete，并且涉及std::accumulate。即使在许多情况下有一些警告，使用容器还是要比管理自己的原始内存以及使用专门用于处理它们的算法和功能更好。它们使事情变得更简单，更易于管理，甚至调试。

我将如何优化此代码？

3 个答案: