Question

对于无法解决数独难题感到沮丧，我很快就一起破解了一个简单的递归回溯求解器：

fn is_allowed(board: &[[u8; 9]; 9], row: usize, col: usize, x: u8) -> bool {
    for i in 0..9 {
        if board[row][i] == x {
            return false;
        }
        if board[i][col] == x {
            return false;
        }
    }

    let r = row - (row % 3);
    let c = col - (col % 3);
    for i in r..(r + 3) {
        for j in c..(c + 3) {
            if board[i][j] == x {
                return false;
            }
        }
    }

    true
}

fn solve(board: &mut [[u8; 9]; 9]) -> bool {
    for i in 0..9 {
        for j in 0..9 {
            if board[i][j] == 0 {
                for x in 1..=9 {
                    if is_allowed(board, i, j, x) {
                        board[i][j] = x;
                        if solve(board) {
                            return true
                        }
                        board[i][j] = 0;
                    }
                }
                return false;
            }
        }
    }
    true
}

fn main() {
    let mut board = [
        [ 0, 0, 8, 0, 0, 4, 0, 0, 0 ],
        [ 0, 0, 0, 0, 0, 0, 0, 0, 7 ],
        [ 0, 0, 6, 0, 0, 0, 0, 1, 0 ],
        [ 0, 0, 0, 0, 0, 0, 5, 0, 9 ],
        [ 0, 0, 0, 6, 0, 0, 0, 0, 0 ],
        [ 0, 2, 0, 8, 1, 0, 0, 0, 0 ],
        [ 9, 4, 0, 0, 0, 0, 0, 0, 0 ],
        [ 0, 0, 0, 0, 0, 0, 1, 8, 0 ],
        [ 0, 0, 7, 0, 0, 5, 0, 0, 0 ],
    ];

    if solve(&mut board) {
        for i in 0..9 {
            println!("{:?}", board[i]);
        }
    } else {
        println!("no solution");
    }
}

在没有优化的情况下运行（cargo run），则需要6分钟以上的时间才能运行。

使用优化（cargo run --release）运行时，大约需要7秒钟才能运行。

什么优化导致了这种差异？

Answer 1

不分析生成的程序集就很难确定，但是我认为这与这些优化的组合有关：

循环展开：编译器知道每个循环运行8次，因此，它以循环作为索引，而不是循环，将循环体编译8次。这就是为什么@MatthieuM在上面的评论中的 godbold 链接很长的原因。
范围检查：由于post_install do |installer| installer.pods_project.targets.each do |target| target.build_configurations.each do |config| config.build_settings['BITCODE_GENERATION_MODE'] = 'bitcode' config.build_settings['ENABLE_BITCODE'] = 'YES' end end end，i和j现在是常量（在展开的循环中）并且数组的大小已知，因此编译器将省略所有范围检查。 / li>
函数内联。

实际上，每次您写k时：

在调试模式下，您要调用两个执行检查和计算的库函数。
在释放模式下，此类代码的每个实例只是对堆栈中固定偏移量的读取或写入。

什么优化可以使我的rust程序更快？

1 个答案: