选项1：直接访问矩阵系数

Question

我正在尝试优化依赖Eigen3的C ++中的关键操作。我不清楚哪种类型的系数访问操作会导致运行时性能损失，或者编译器何时会做好。为了找出我的困惑的根源，我在下面发布了一个示例，该示例以几种不同的方式实现，并为每种方式提供了一些假设。

更多细节：

矩阵 M 将在整个程序中保持不变
critical_function 确实被调用过很多次，这就是为什么要内联

有人可以说明哪种方法在性能方面会是最好的方法吗？我可能会对引用，取消引用等的影响成本感到困惑。

选项1：直接访问矩阵系数

#include <Eigen/Dense>
class A{
    A(){
        // Assume M has the right numbers
    }

    // This function will be called many many times, inside loops
    inline void critical_function()
    {
        // Do many operations using M(1, 1), for example:
        double y = 1 / M(1, 1);
        // ... some more code using M(1, 1)
    }
private:
    Eigen::Matrix3d M;
};

假设：

M（1,1）导致不断的解引用，从而产生成本，因为将在计算偏移量时添加周期（这不是数组，但尚不清楚编译器如何管理它）

选择2：创建我们关心的系数的副本

#include <Eigen/Dense>
class A{
    A(){
        // Assume M has the right numbers
        x = M(1, 1);
    }

    // This function will be called many many times, inside loops
    inline void critical_function()
    {
        // Do many operations using x, for example:
        double y = 1 / x;
        // ... some more code using x
    }
private:
    double x;
    Eigen::Matrix3d M;
};

假设：

访问 x 所产生的周期比访问 M（1，1）的周期少，因此，它比选项1更可取。
x 确实包含与 M（1,1）相同的值，但是冒着确保重复此数据的重要风险，因此对于代码，应避免使用维护。

选项3：利用引用

#include <Eigen/Dense>
class A{
    A(){
        // Assume M has the right numbers
    }

    // This function will be called many many times, inside loops
    inline void critical_function()
    {
        auto & x = M(1, 1);
        // Do many operations using x, for example:
        double y = 1 / x;
        // ... some more code using x
    }
private:
    Eigen::Matrix3d M;
};

假设：

只有一个引用 x 会比在函数范围内不断引用 M（1,1）产生更少的周期。
此潜在优化仅在 critical_function 内部具有影响，而不会在外部范围内延续，例如多次调用该函数的循环。

编辑

将类型更正为double（从int或float形式），以与Matrix3d保持一致。

Answer 1

简而言之，不要打扰写M(1,1)。

如果您要处理Matrix3d之类的编译时矩阵和编译时已知的索引，那么M(1,1)中涉及的索引计算将被任何编译器完全优化。换句话说，以下两个片段将生成相同的程序集：

struct A {
  Matrix3d M;
  void foo() { double x = M(1,1); }
};

struct A {
  double a, b, c, d, e, f, g, h, i;
  void foo() { double x = e; }
};

因此，选择2会更糟，并且选择3也可能会降低性能，因为您引入了指针。

特征3：如何在性能至关重要的操作中访问矩阵系数？

选项1：直接访问矩阵系数

选择2：创建我们关心的系数的副本

选项3：利用引用

编辑

1 个答案: