我在阅读从数学到通用编程的Stepanov&玫瑰
在本书的第一章中,作者优化了整数乘法算法,从一个简单的方法开始,以一个高度优化的方法(mul0
到mul4
)结束。
我在Rust中实现了所讨论的算法,并使用cargo bench
对它们进行了基准测试。令我惊讶的是,本书中最快的算法是基准测试中最慢的,我不知道这是怎么回事。通常的怀疑是编译器优化,但我没有足够的装配技能来手动检查二进制文件。
为什么会这样?
#![feature(test)]
extern crate test;
fn mul_acc4(a: i32, b: i32, r: i32) -> i32 {
let mut r = r;
let mut a = a;
let mut b = b;
loop {
if odd(a) {
r = r + b;
if a == 1 {
return r;
}
}
a = half(a);
b = b + b;
}
}
fn mul0(a: i32, b: i32) -> i32 {
if a == 1 {
return b;
}
return mul0(a - 1, b) + b;
}
fn mul2(a: i32, b: i32) -> i32 {
if a == 1 {
return b;
}
return mul_acc4(b, a - 1, b);
}
fn mul4(a: i32, b: i32) -> i32 {
let mut a = a;
let mut b = b;
while !odd(a) {
b = b + b;
a = half(a);
}
if a == 1 {
return b;
}
return mul_acc4(b + b, half(a - 1), b);
}
fn half(a: i32) -> i32 {
a >> 1
}
fn odd(a: i32) -> bool {
a & 0x1 == 1
}
#[cfg(test)]
mod tests {
use super::*;
use test::{black_box, Bencher};
#[bench]
fn bench_mul0(b: &mut Bencher) {
b.iter(|| {
let arg = 42000;
let mut acc = 0;
for i in 1..1000 {
acc += mul0(arg, i);
}
return acc;
});
}
#[bench]
fn bench_mul2(b: &mut Bencher) {
b.iter(|| {
let arg = 42000;
let mut acc = 0;
for i in 1..1000 {
acc += mul2(arg, i);
}
return acc;
});
}
#[bench]
fn bench_mul4(b: &mut Bencher) {
b.iter(|| {
let arg = 42000;
let mut acc = 0;
for i in 1..1000 {
acc += mul4(arg, i);
}
return acc;
});
}
}
基准测试结果
tests::bench_mul0 ... bench: 0 ns/iter (+/- 0) test
tests::bench_mul2 ... bench: 15,535 ns/iter (+/- 6,887) test
tests::bench_mul4 ... bench: 29,015 ns/iter (+/- 9,168)