Question

我有一个稀疏的线性系统Ax = b。在我的应用程序中，A是一个对称的稀疏矩阵，其典型大小约为2,500,000 x 2,500,000，在主对角线上和另一对角线上均具有非零值（加上该对角线的对称性）。这使其每行/列为2-3个非零值。

要测试我的代码，我正在比较MATLAB和Eigen。我创建了一个1,000,000 x 1,000,000的稀疏矩阵A。在MATLAB中，我仅使用x = A\b，大约需要8秒钟。在Eigen中，我尝试了几种求解器。 SuperLU大约需要150秒。 SimplicialCholesky大约需要300秒。 UmfPackLU大约需要490 s。这些时间对我来说太长了;在真实数据上，花太长时间才有用。与MATLAB相比，其他求解器给出的结果完全不同，迭代求解器花费的时间太长。 SimplicialCholesky，SuperLU和UmfPackLU给出的相似（它们在小数点后不同），所以我希望这也一样。特征码：

// prepare sparse matrix A
    std::vector<T> tripletList; // I am leaving filling the triplet list out
    Eigen::SparseMatrix<float> A(k, k); // k is usually around 2500000, in the test case I described here it is 1000000
    A.setFromTriplets(tripletList.begin(), tripletList.end());
    A.makeCompressed();

// prepare vector b
    Eigen::Map<Eigen::VectorXf> b; // vector b is filled with values

// calculate A x = b and measure time - for SimplicialCholesky
    t1 = std::chrono::steady_clock::now();
    Eigen::SimplicialCholesky<Eigen::SparseMatrix<float>> solver_chol(A);
    x = solver_chol.solve(b);
    t2 = std::chrono::steady_clock::now();
    log_file << "SimlicialCholeskytime: t2 - t1 = " << std::chrono::duration_cast<std::chrono::seconds>(t2 - t1).count() << " s \n";

// calculate A x = b and measure time - for SparseLU
    t1 = std::chrono::steady_clock::now();
    Eigen::SparseLU<Eigen::SparseMatrix<float>> solver_slu(A);
    x = solver_slu.solve(b);
    t2 = std::chrono::steady_clock::now();
    log_file << "SparseLU time: t2 - t1 = " << std::chrono::duration_cast<std::chrono::seconds>(t2 - t1).count() << " s \n";

// calculate A x = b and measure time - for UmfPackLU - here I had to convert to double.
    Eigen::SparseMatrix<double> Ad = A.cast <double>();
    Ad.makeCompressed();
    Eigen::VectorXd bd = b.cast <double>();
    t1 = std::chrono::steady_clock::now();
    Eigen::UmfPackLU<Eigen::SparseMatrix<double>> solver(Ad);
    Eigen::VectorXd xd = solver.solve(bd);
    t2 = std::chrono::steady_clock::now();
    log_file << "UmfPackLU time: t2 - t1 = " << std::chrono::duration_cast<std::chrono::seconds>(t2 - t1).count() << " s \n";

也许我应该提到计算是在所有8个内核上运行的，所以当我观察时间时，我得到了8次，我对此进行了总结。同样，（到目前为止）计算被包装在.dll库.cu中，它将在下一步中通过CUDA进行并行化。为了避免计数重叠，我分别测量了所有方法的时间。

我找到了以下可能的解决方案来加快计算速度：

Use normal lu，不适用于稀疏系统；
Linking to BLAS/LAPACK library，我想我已经做到了。
try different solvers，or wrappers，其他求解器的结果与MATLAB不同。这里的答案是针对具体情况的；
multithreading, use compiler with enabled optimizations完成（编译器-最大程度的优化，有利于速度），但仍然很慢；
use UmfPack, same as MATLAB does, to get similar performance-它甚至比SimlicialCholesky还要慢
list of other possible libraries working with matrices，但我不知道他们将如何处理我的案件

我可以做些什么来使用Eigen加快计算速度，所以它花费的时间与MATLAB相似？关于矩阵的大小和稀疏性，我是否使用正确的求解器？我是否正确使用当前的求解器？我是否必须进行一些其他设置，包括其他库？如果不可能，还有其他我可以使用的库吗？

我正在Windows 10 64位计算机上工作。我有Visual Studio 2019。

Answer 1

最近，我为光谱搭配求解器尝试了许多线性求解器，我发现“ armadillo”是基于openblas库求解密集Ax = b的快速方法。即使使用“ setNumbthreads”，Eigen3.3也非常慢，我仍然找不到原因。如果要使用Cuda或OpenMP解决它。我强烈建议您使用Paralution库。它可以很好地解决我的问题。问候

http://www.paralution.com/

解决稀疏系统：本征与MATLAB

1 个答案: