特征LDLT比LLT慢?

时间:2014-04-26 10:29:10

标签: c++ eigen

我正在使用Eigen 3的Cholesky模块来求解线性方程组。 Eigen文档指出,使用LDLT代替LLT会更快达到此目的,但我的基准测试显示了不同的结果。

我使用以下代码进行基准测试:

#include <iostream>
#include <chrono>
#include <Eigen/Core>
#include <Eigen/Cholesky>
using namespace std;
using namespace std::chrono;
using namespace Eigen;

int main()
{
    MatrixXf cov = MatrixXf::Random(4200, 4200);
    cov = (cov + cov.transpose()) + 1000 * MatrixXf::Identity(4200, 4200);
    VectorXf b = VectorXf::Random(4200), r1, r2;

    r1 = b;
    LLT<MatrixXf> llt;
    auto start = high_resolution_clock::now();
    llt.compute(cov);
    if (llt.info() != Success)
    {
        cout << "Error on LLT!" << endl;
        return 1;
    }
    auto middle = high_resolution_clock::now();
    llt.solveInPlace(r1);
    auto stop = high_resolution_clock::now();
    cout << "LLT decomposition & solving in  " << duration_cast<milliseconds>(middle - start).count()
         << " + " << duration_cast<milliseconds>(stop - middle).count() << " ms." << endl;

    r2 = b;
    LDLT<MatrixXf> ldlt;
    start = high_resolution_clock::now();
    ldlt.compute(cov);
    if (ldlt.info() != Success)
    {
        cout << "Error on LDLT!" << endl;
        return 1;
    }
    middle = high_resolution_clock::now();
    ldlt.solveInPlace(r2);
    stop = high_resolution_clock::now();
    cout << "LDLT decomposition & solving in " << duration_cast<milliseconds>(stop - start).count()
         << " + " << duration_cast<milliseconds>(stop - middle).count() << " ms." << endl;

    cout << "Total result difference: " << (r2 - r1).cwiseAbs().sum() << endl;
    return 0;
}

我在Windows上用g++ -std=c++11 -O2 -o llt.exe llt.cc编译了它,这就是我得到的:

LLT decomposition & solving in  6515 + 15 ms.
LDLT decomposition & solving in 8562 + 15 ms.
Total result difference: 1.27354e-006

那么,为什么LDLT比LLT慢?我做错了什么或者我是否错过理解文档?

1 个答案:

答案 0 :(得分:4)

文档的这句话已经过时了。对于相当大的矩阵,LLT应该比LDLT快得多,因为LLT实现利用了缓存友好的矩阵 - 矩阵运算,而LDLT实现仅涉及旋转和矩阵向量运算。通过devel分支,你的例子给了我:

LLT decomposition & solving in  380 + 4 ms.
LDLT decomposition & solving in 2746 + 4 ms.