乘法算法比预期慢

时间:2018-02-17 14:16:57

标签: performance rust benchmarking

我在阅读从数学到通用编程的Stepanov&玫瑰

在本书的第一章中,作者优化了整数乘法算法,从一个简单的方法开始,以一个高度优化的方法(mul0mul4)结束。

我在Rust中实现了所讨论的算法,并使用cargo bench对它们进行了基准测试。令我惊讶的是,本书中最快的算法是基准测试中最慢的,我不知道这是怎么回事。通常的怀疑是编译器优化,但我没有足够的装配技能来手动检查二进制文件。

为什么会这样?

#![feature(test)]

extern crate test;

fn mul_acc4(a: i32, b: i32, r: i32) -> i32 {
    let mut r = r;
    let mut a = a;
    let mut b = b;

    loop {
        if odd(a) {
            r = r + b;

            if a == 1 {
                return r;
            }
        }

        a = half(a);
        b = b + b;
    }
}

fn mul0(a: i32, b: i32) -> i32 {
    if a == 1 {
        return b;
    }

    return mul0(a - 1, b) + b;
}

fn mul2(a: i32, b: i32) -> i32 {
    if a == 1 {
        return b;
    }

    return mul_acc4(b, a - 1, b);
}

fn mul4(a: i32, b: i32) -> i32 {
    let mut a = a;
    let mut b = b;

    while !odd(a) {
        b = b + b;
        a = half(a);
    }

    if a == 1 {
        return b;
    }

    return mul_acc4(b + b, half(a - 1), b);
}

fn half(a: i32) -> i32 {
    a >> 1
}

fn odd(a: i32) -> bool {
    a & 0x1 == 1
}

#[cfg(test)]
mod tests {

    use super::*;
    use test::{black_box, Bencher};

    #[bench]
    fn bench_mul0(b: &mut Bencher) {
        b.iter(|| {
            let arg = 42000;
            let mut acc = 0;

            for i in 1..1000 {
                acc += mul0(arg, i);
            }

            return acc;
        });
    }

    #[bench]
    fn bench_mul2(b: &mut Bencher) {
        b.iter(|| {
            let arg = 42000;
            let mut acc = 0;
            for i in 1..1000 {
                acc += mul2(arg, i);
            }

            return acc;
        });
    }

    #[bench]
    fn bench_mul4(b: &mut Bencher) {
        b.iter(|| {
            let arg = 42000;
            let mut acc = 0;
            for i in 1..1000 {
                acc += mul4(arg, i);
            }

            return acc;
        });
    }
}

基准测试结果

tests::bench_mul0 ... bench:           0 ns/iter (+/- 0) test
tests::bench_mul2 ... bench:      15,535 ns/iter (+/- 6,887) test
tests::bench_mul4 ... bench:      29,015 ns/iter (+/- 9,168)

0 个答案:

没有答案