算术解决方案

Question

在教程中有些东西让我感到困惑。准确地说，整数除法。

看似首选的方法是将除数转换为浮点数，然后将浮点数舍入为最接近的整数，然后将其转换为整数：

#include <cmath>
int round_divide_by_float_casting(int a, int b){
    return  (int) std::roundf( a / (float) b); 
}

然而，这似乎是用右手抓你的左耳。我用的是：

int round_divide (int a, int b){
    return a / b + a % b * 2 / b;
}

这不是突破，但它不标准的事实让我想知道我是否遗漏了什么？尽管我（尽管有限）测试，我找不到任何两种方法给我不同结果的情况。是否有人遇到某种情况，其中int-＆gt; float-＆gt; int转换产生了更准确的结果？

Answer 1

算术解决方案

如果一个人定义了你的函数应返回的内容，她会将其描述为接近＆＃34; f(a, b)返回a除以{的最接近整数结果{1}}在真正的除数环中。＆＃34;

因此，问题可以概括为，我们可以仅使用整数除法来定义这个最接近的整数。我想我们可以。

正好有两个候选者作为最接近的整数：b和a / b（1）。选择很简单，如果(a / b) + 1与a % b更接近0，那么b就是我们的结果。如果没有，a / b是。

可以写一些类似的东西，忽略优化和良好实践：

(a / b) + 1

虽然这个定义满足了需求，但可以通过使用std::div()计算int divide(int a, int b) { const int quot = a / b; const int rem = a % b; int result; if (rem < b - rem) { result = quot; } else { result = quot + 1; } return result; }除a两倍来优化它：

我们之前所做的问题的分析向我们保证了我们实施的明确行为。

（1）最后要检查的是：当int divide(int a, int b) { const std::div_t dv = std::div(a, b); int result = dv.quot; if (dv.rem >= b - dv.rem) { ++result; } return result; }或a为负时，它的行为如何？这留给读者;）。

基准

<强>输出：

#include <iostream>
#include <iomanip>
#include <string>

// solutions
#include <cmath>
#include <cstdlib>

// benchmak
#include <limits>
#include <random>
#include <chrono>
#include <algorithm>
#include <functional>

//
// Solutions
//
namespace
{
    int round_divide_by_float_casting(int a, int b) {
        return  (int)roundf(a / (float)b);
    }

    int round_divide_by_modulo(int a, int b) {
        return a / b + a % b * 2 / b;
    }

    int divide_by_quotient_comparison(int a, int b)
    {
        const std::div_t dv = std::div(a, b);
        int result = dv.quot;

        if (dv.rem >= b - dv.rem)
        {
            ++result;
        }
        return result;
    }
}

//
// benchmark
//
class Randomizer
{
    std::mt19937 _rng_engine;
    std::uniform_int_distribution<int> _distri;

public:
    Randomizer() : _rng_engine(std::time(0)), _distri(std::numeric_limits<int>::min(), std::numeric_limits<int>::max())
    {
    }

    template<class ForwardIt>
    void operator()(ForwardIt begin, ForwardIt end)
    {
        std::generate(begin, end, std::bind(_distri, _rng_engine));
    }
};

class Clock
{
    std::chrono::time_point<std::chrono::steady_clock> _start;

public:
    static inline std::chrono::time_point<std::chrono::steady_clock> now() { return std::chrono::steady_clock::now(); }

    Clock() : _start(now())
    {
    }

    template<class DurationUnit>
    std::size_t end()
    {
        return std::chrono::duration_cast<DurationUnit>(now() - _start).count();
    }
};

//
// Entry point
//
int main()
{
    Randomizer randomizer;
    std::array<int, 1000> dividends; // SCALE THIS UP (1'000'000 would be great)
    std::array<int, dividends.size()> divisors;
    std::array<int, dividends.size()> results;
    randomizer(std::begin(dividends), std::end(dividends));
    randomizer(std::begin(divisors), std::end(divisors));

    {
        Clock clock;
        auto dividend = std::begin(dividends);
        auto divisor = std::begin(divisors);
        auto result = std::begin(results);
        for ( ; dividend != std::end(dividends) ; ++dividend, ++divisor, ++result)
        {
            *result = round_divide_by_float_casting(*dividend, *divisor);
        }
        const float unit_time = clock.end<std::chrono::nanoseconds>() / static_cast<float>(results.size());
        std::cout << std::setw(40) << "round_divide_by_float_casting(): " << std::setprecision(3) << unit_time << " ns\n";
    }
    {
        Clock clock;
        auto dividend = std::begin(dividends);
        auto divisor = std::begin(divisors);
        auto result = std::begin(results);
        for ( ; dividend != std::end(dividends) ; ++dividend, ++divisor, ++result)
        {
            *result = round_divide_by_modulo(*dividend, *divisor);
        }
        const float unit_time = clock.end<std::chrono::nanoseconds>() / static_cast<float>(results.size());
        std::cout << std::setw(40) << "round_divide_by_modulo(): " << std::setprecision(3) << unit_time << " ns\n";
    }
    {
        Clock clock;
        auto dividend = std::begin(dividends);
        auto divisor = std::begin(divisors);
        auto result = std::begin(results);
        for ( ; dividend != std::end(dividends) ; ++dividend, ++divisor, ++result)
        {
            *result = divide_by_quotient_comparison(*dividend, *divisor);
        }
        const float unit_time = clock.end<std::chrono::nanoseconds>() / static_cast<float>(results.size());
        std::cout << std::setw(40) << "divide_by_quotient_comparison(): " << std::setprecision(3) << unit_time << " ns\n";
    }
}

Demo

两种算术解决方案＆＃39;表演无法区分（当你扩大替补席规模时，他们的基准会收敛）。

Answer 2

这实际上取决于处理器，并且更好的整数范围（使用double将解决大多数范围问题）

对于现代＆＃34;大＆＃34;像x86-64和ARM这样的CPU，整数除法和浮点除法大致相同，将整数转换为浮点数或反之亦然不是＆＃34; hard＆＃34;任务（并且至少在该转换中直接进行正确的舍入），因此最有可能产生的操作是。

atmp = (float) a;
btmp = (float) b;
resfloat = divide atmp/btmp;
return = to_int_with_rounding(resfloat)

关于四台机器说明。

另一方面，你的代码使用两个除法，一个模数和一个乘法，这在这样的处理器上很可能更长。

tmp = a/b;
tmp1 = a % b;
tmp2 = tmp1 * 2;
tmp3 = tmp2 / b;
tmp4 = tmp + tmp3;

所以五条指令，其中三条是＆＃34;划分＆＃34; （除非编译器足够聪明，可以重用a / b a % b - 但它仍然是两个不同的分歧。）

当然，如果你超出float或double可以保持而不会丢失数字的位数范围（浮点数为23位，双数为53位），那么你的方法可能更好（假设没有溢出）在整数数学中。）

最重要的是，由于第一个表单由＆＃34;每个人使用＆＃34;，它是编译器识别并可以优化的表单。

显然，结果取决于所使用的编译器及其运行的处理器，但这些是我运行上面发布的代码的结果，通过clang++编译（v3.9-pre-release，漂亮）接近发布3.8）。

   round_divide_by_float_casting(): 32.5 ns
          round_divide_by_modulo(): 113 ns
   divide_by_quotient_comparison(): 80.4 ns

然而，当我查看生成的代码时，我发现了有趣的事情：

xorps   %xmm0, %xmm0
cvtsi2ssl   8016(%rsp,%rbp), %xmm0
xorps   %xmm1, %xmm1
cvtsi2ssl   4016(%rsp,%rbp), %xmm1
divss   %xmm1, %xmm0
callq   roundf
cvttss2si   %xmm0, %eax
movl    %eax, 16(%rsp,%rbp)
addq    $4, %rbp
cmpq    $4000, %rbp             # imm = 0xFA0
jne .LBB0_7

是round实际上是一个电话。这让我感到惊讶，但解释了为什么在某些机器上（特别是最新的x86处理器），它更快。

g++可以使用-ffast-math获得更好的结果，其中包含：

  round_divide_by_float_casting(): 17.6 ns
          round_divide_by_modulo(): 43.1 ns
   divide_by_quotient_comparison(): 18.5 ns

（这是增加到100k值的数量）

Answer 3

首选标准解决方案。使用在cstdlib中声明的std :: div系列函数。

请参阅：http://en.cppreference.com/w/cpp/numeric/math/div

编辑：在某些体系结构（e.x）上，转换为float然后转换为int可能效率很低。微控制器。

Answer 4

感谢您提出的建议。为了摆脱一些亮点，我做了一个测试设置来比较性能。

#include <iostream>
#include <string>
#include <cmath>
#include <cstdlib>
#include <chrono>

using namespace std;

int round_divide_by_float_casting(int a, int b) {
    return  (int)roundf(a / (float)b);
}

int round_divide_by_modulo(int a, int b) {
    return a / b + a % b * 2 / b;
}

int divide_by_quotient_comparison(int a, int b)
{
    const std::div_t dv = std::div(a, b);
    int result = dv.quot;

    if (dv.rem <= b - dv.rem) {
        ++result;
    }
    return result;
}

int main()
{
    int itr = 1000;

    //while (true) {
        auto begin = chrono::steady_clock::now();
        for (int i = 0; i < itr; i++) {
            for (int j = 10; j < itr + 1; j++) {
                divide_by_quotient_comparison(i, j);
            }
        }
        auto end = std::chrono::steady_clock::now();
        cout << "divide_by_quotient_comparison(,) function took : " << chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << endl;

        begin = chrono::steady_clock::now();
        for (int i = 0; i < itr; i++) {
            for (int j = 10; j < itr + 1; j++) {
                round_divide_by_float_casting(i, j);
            }
        }
        end = std::chrono::steady_clock::now();
        cout << "round_divide_by_float_casting(,) function took : " << chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << endl;

        begin = chrono::steady_clock::now();
        for (int i = 0; i < itr; i++) {
            for (int j = 10; j < itr + 1; j++) {
                round_divide_by_modulo(i, j);
            }
        }
        end = std::chrono::steady_clock::now();
        cout << "round_divide_by_modulo(,) function took : " << chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << endl;

    //}

    return 0;
}

我在我的机器上得到的结果（带有vs2015的i7）如下：模运算速度大约是 int-＆gt; float-＆gt; int 投射方法的两倍。依赖 std :: div_t （由@YSC和@teroi建议）的方法比 int-＆gt; float-＆gt; int 更快，但比模运算法。

修改进行了第二次测试以避免@YSC指出的某些编译器优化：＃包括＃包括＃包括＃包括＃包括＃包括使用namespace std;

int round_divide_by_float_casting(int a, int b) { return (int)roundf(a / (float)b); } int round_divide_by_modulo(int a, int b) { return a / b + a % b * 2 / b; } int divide_by_quotient_comparison(int a, int b) { const std::div_t dv = std::div(a, b); int result = dv.quot; if (dv.rem <= b - dv.rem) { ++result; } return result; } int main() { int itr = 100; vector <int> randi, randj; for (int i = 0; i < itr; i++) { randi.push_back(rand()); int rj = rand(); if (rj == 0) rj++; randj.push_back(rj); } vector<int> f, m, q; while (true) { auto begin = chrono::steady_clock::now(); for (int i = 0; i < itr; i++) { for (int j = 0; j < itr; j++) { q.push_back( divide_by_quotient_comparison(randi[i] , randj[j]) ); } } auto end = std::chrono::steady_clock::now(); cout << "divide_by_quotient_comparison(,) function took : " << chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << endl; begin = chrono::steady_clock::now(); for (int i = 0; i < itr; i++) { for (int j = 0; j < itr; j++) { f.push_back( round_divide_by_float_casting(randi[i], randj[j]) ); } } end = std::chrono::steady_clock::now(); cout << "round_divide_by_float_casting(,) function took : " << chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << endl; begin = chrono::steady_clock::now(); for (int i = 0; i < itr; i++) { for (int j = 0; j < itr; j++) { m.push_back( round_divide_by_modulo(randi[i], randj[j]) ); } } end = std::chrono::steady_clock::now(); cout << "round_divide_by_modulo(,) function took : " << chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << endl; cout << endl; f.clear(); m.clear(); q.clear(); } return 0; }

在第二次测试中，最慢的是divide_by_quotient()依赖 std :: div_t ，然后是divide_by_float()，最快的是divide_by_modulo()。然而，这次性能差异大大低得多，低于20％。

四舍五入常规

4 个答案:

算术解决方案

基准