Question

我正在使用Eigen编写用于计算力学的通用库，主要处理6x6大小的矩阵和6x1大小的向量。我考虑使用Eigen::Ref<>模板使其也可用于段和块，如http://eigen.tuxfamily.org/dox/TopicFunctionTakingEigenTypes.html和Correct usage of the Eigen::Ref<> class

中所述

但是，通过较小的性能比较，发现与标准c ++引用相比，Eigen::Ref对于此类小功能具有相当大的开销：

#include <ctime>
#include <iostream>
#include "Eigen/Core"


Eigen::Matrix<double, 6, 6> testRef(const Eigen::Ref<const Eigen::Matrix<double, 6, 6>>& A)
{
    Eigen::Matrix<double, 6, 6> temp = (A * A) * A;
    temp.diagonal().setOnes();
    return temp;
}

Eigen::Matrix<double, 6, 6> testNoRef(const Eigen::Matrix<double, 6, 6>& A)
{
    Eigen::Matrix<double, 6, 6> temp = (A * A) * A; 
    temp.diagonal().setOnes();
    return temp;
}


int main(){

  using namespace std;

  int cycles = 10000000;
  Eigen::Matrix<double, 6, 6> testMat;
  testMat = Eigen::Matrix<double, 6, 6>::Ones();

  clock_t begin = clock();

  for(int i = 0; i < cycles; i++)
      testMat = testRef(testMat);

  clock_t end = clock();


  double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;

  std::cout << "Ref: " << elapsed_secs << std::endl;

  begin = clock();

  for(int i = 0; i < cycles; i++)
      testMat = testNoRef(testMat);
  end = clock();

  elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;

  std::cout << "noRef : " << elapsed_secs << std::endl;


    return 0;
}

输出为gcc -O3：

Ref: 1.64066
noRef : 1.1281

因此，Eigen::Ref似乎有相当大的开销，至少在实际计算工作量较小的情况下。另一方面，如果传递了块或段，则使用const Eigen::Matrix<double, 6, 6>& A的方法会导致不必要的复制：

#include <Eigen/Core>
#include <iostream>


void test( const Eigen::Vector3d& a)
{
    std::cout << "addr in function " << &a << std::endl;
}

int main () {

    Eigen::Vector3d aa;
    aa << 1,2,3;
    std::cout << "addr outside function " << &aa << std::endl;

    test ( aa ) ;
    test ( aa.head(3) ) ;


    return 0;
}

输出：

addr outside function 0x7fff85d75960
addr in function 0x7fff85d75960
addr in function 0x7fff85d75980

因此，一般情况下不使用这种方法。

或者，可以使用Eigen::MatrixBase来制作功能模板，如文档中所述。但是，这对于大型库而言似乎效率不高，并且无法像我这样适应固定大小的矩阵（6x6、6x1）。

还有其他选择吗？大型通用库的一般建议是什么？

提前谢谢！

编辑：根据评论中的建议修改了第一个基准示例

Answer 1

使用Ref<>，您要付出丢失两个信息的代价（与Matrix相比）：

您丢失了输入是内存对齐的知识。
您失去了编译时的知识，即列是按顺序存储的（因此，两列被6个双精度分隔）。

这是通用性和最高性能之间的经典折衷。

C ++特征库：Ref <>的性能开销

1 个答案: