是否可以使用预先计算的分解来加速反斜杠\ mldivide与稀疏矩阵

时间:2015-01-03 20:03:40

标签: matlab math linear-algebra umfpack

我执行了许多迭代求解线性方程组:Mx=b大而稀疏的M. M不会在迭代之间发生变化,但b会发生变化。我尝试了几种方法,到目前为止发现反斜杠\ mldivide是最有效和准确的。

以下代码与我正在做的非常相似:

for ii=1:num_iter
  x = M\x;
  x = x+dx;
end

现在我想通过利用M固定的事实来进一步加速计算。

设置标志spparms('spumoni',2)可以获得解算器算法的详细信息。

我运行了以下代码:

spparms('spumoni',2);
x = M\B;

输出(监控):

sp\: bandwidth = 2452+1+2452.
sp\: is A diagonal? no.
sp\: is band density (0.01) > bandden (0.50) to try banded solver? no.
sp\: is A triangular? no.
sp\: is A morally triangular? no.
sp\: is A a candidate for Cholesky (symmetric, real positive diagonal)? no.
sp\: use Unsymmetric MultiFrontal PACKage with Control parameters:
UMFPACK V5.4.0 (May 20, 2009), Control:
    Matrix entry defined as: double
    Int (generic integer) defined as: UF_long

    0: print level: 2
    1: dense row parameter:    0.2
        "dense" rows have    > max (16, (0.2)*16*sqrt(n_col) entries)
    2: dense column parameter: 0.2
        "dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
    3: pivot tolerance: 0.1
    4: block size for dense matrix kernels: 32
    5: strategy: 0 (auto)
    6: initial allocation ratio: 0.7
    7: max iterative refinement steps: 2
    12: 2-by-2 pivot tolerance: 0.01
    13: Q fixed during numerical factorization: 0 (auto)
    14: AMD dense row/col parameter:    10
       "dense" rows/columns have > max (16, (10)*sqrt(n)) entries
        Only used if the AMD ordering is used.
    15: diagonal pivot tolerance: 0.001
        Only used if diagonal pivoting is attempted.
    16: scaling: 1 (divide each row by sum of abs. values in each row)
    17: frontal matrix allocation ratio: 0.5
    18: drop tolerance: 0
    19: AMD and COLAMD aggressive absorption: 1 (yes)

    The following options can only be changed at compile-time:
    8: BLAS library used:  Fortran BLAS.  size of BLAS integer: 8
    9: compiled for MATLAB
    10: CPU timer is ANSI C clock (may wrap around).
    11: compiled for normal operation (debugging disabled)
    computer/operating system: Microsoft Windows
    size of int: 4 UF_long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes)

sp\: is UMFPACK's symbolic LU factorization (with automatic reordering) successful? yes.
sp\: is UMFPACK's numeric LU factorization successful? yes.
sp\: is UMFPACK's triangular solve successful? yes.
sp\: UMFPACK Statistics:
UMFPACK V5.4.0 (May 20, 2009), Info:
    matrix entry defined as:          double
    Int (generic integer) defined as: UF_long
    BLAS library used: Fortran BLAS.  size of BLAS integer: 8
    MATLAB:                           yes.
    CPU timer:                        ANSI clock ( ) routine.
    number of rows in matrix A:       3468
    number of columns in matrix A:    3468
    entries in matrix A:              60252
    memory usage reported in:         16-byte Units
    size of int:                      4 bytes
    size of UF_long:                  8 bytes
    size of pointer:                  8 bytes
    size of numerical entry:          8 bytes

    strategy used:                    symmetric
    ordering used:                    amd on A+A'
    modify Q during factorization:    no
    prefer diagonal pivoting:         yes
    pivots with zero Markowitz cost:               1284
    submatrix S after removing zero-cost pivots:
        number of "dense" rows:                    0
        number of "dense" columns:                 0
        number of empty rows:                      0
        number of empty columns                    0
        submatrix S square and diagonal preserved
    pattern of square submatrix S:
        number rows and columns                    2184
        symmetry of nonzero pattern:               0.904903
        nz in S+S' (excl. diagonal):               62184
        nz on diagonal of matrix S:                2184
        fraction of nz on diagonal:                1.000000
    AMD statistics, for strict diagonal pivoting:
        est. flops for LU factorization:           2.76434e+007
        est. nz in L+U (incl. diagonal):           306216
        est. largest front (# entries):            31329
        est. max nz in any column of L:            177
        number of "dense" rows/columns in S+S':    0
    symbolic factorization defragmentations:       0
    symbolic memory usage (Units):                 174698
    symbolic memory usage (MBytes):                2.7
    Symbolic size (Units):                         9196
    Symbolic size (MBytes):                        0
    symbolic factorization CPU time (sec):         0.00
    symbolic factorization wallclock time(sec):    0.00

    matrix scaled: yes (divided each row by sum of abs values in each row)
    minimum sum (abs (rows of A)):              1.00000e+000
    maximum sum (abs (rows of A)):              9.75375e+003

    symbolic/numeric factorization:      upper bound               actual      %
    variable-sized part of Numeric object:
        initial size (Units)                  149803               146332    98%
        peak size (Units)                    1037500               202715    20%
        final size (Units)                    787803               154127    20%
    Numeric final size (Units)                806913               171503    21%
    Numeric final size (MBytes)                 12.3                  2.6    21%
    peak memory usage (Units)                1083860               249075    23%
    peak memory usage (MBytes)                  16.5                  3.8    23%
    numeric factorization flops         5.22115e+008         2.59546e+007     5%
    nz in L (incl diagonal)                   593172               145107    24%
    nz in U (incl diagonal)                   835128               154044    18%
    nz in L+U (incl diagonal)                1424832               295683    21%
    largest front (# entries)                 348768                30798     9%
    largest # rows in front                      519                  175    34%
    largest # columns in front                   672                  177    26%

    initial allocation ratio used:                 0.309
    # of forced updates due to frontal growth:     1
    number of off-diagonal pivots:                 0
    nz in L (incl diagonal), if none dropped       145107
    nz in U (incl diagonal), if none dropped       154044
    number of small entries dropped                0
    nonzeros on diagonal of U:                     3468
    min abs. value on diagonal of U:               4.80e-002
    max abs. value on diagonal of U:               1.00e+000
    estimate of reciprocal of condition number:    4.80e-002
    indices in compressed pattern:                 13651
    numerical values stored in Numeric object:     295806
    numeric factorization defragmentations:        0
    numeric factorization reallocations:           0
    costly numeric factorization reallocations:    0
    numeric factorization CPU time (sec):          0.05
    numeric factorization wallclock time (sec):    0.00
    numeric factorization mflops (CPU time):       552.22

    solve flops:                                   1.78396e+006
    iterative refinement steps taken:              1
    iterative refinement steps attempted:          1
    sparse backward error omega1:                  1.80e-016
    sparse backward error omega2:                  0.00e+000
    solve CPU time (sec):                          0.00
    solve wall clock time (sec):                   0.00

    total symbolic + numeric + solve flops:        2.77385e+007

注意以下几行:

numeric factorization flops         5.22115e+008         2.59546e+007     5%
solve flops:                                   1.78396e+006
total symbolic + numeric + solve flops:        2.77385e+007

它表明M的因式分解需要2.59546e + 007 / 2.77385e + 007 =解决方程所需总时间的93.6%。

我想在迭代之外预先计算分解,然后只运行占用大约6.5%CPU时间的最后一个阶段。

我知道如何计算分解([L,U,P,Q,R] = lu(M);),但我不知道如何利用其输出作为求解器的输入。

我想按照以下精神运行:

[L,U,P,Q,R] = lu(M);
for ii=1:num_iter
  dx = solve_pre_factored(M,P,Q,R,x);
  x = x+dx;
end

有没有办法在Matlab中做到这一点?

1 个答案:

答案 0 :(得分:3)

你必须问自己LU分解中的所有这些矩阵是什么。

正如the documentation所述:

  

[L,U,P,Q,R] = lu(A)返回单位下三角矩阵L,上三角矩阵U,置换矩阵P和Q,以及对角缩放矩阵R,使得P *(R' A) Q = L U用于稀疏非空A.通常,但并非总是如此,行缩放导致更稀疏且更稳定的分解。语句lu(A,'matrix')返回相同的输出值。

因此,在更多数学术语中,我们PR -1 AQ = LU,因此A = RP -1 LUQ -1

然后可以通过以下步骤重写x = M\x

  1. y = R -1 x
  2. z = P y
  3. u = L -1 z
  4. v = U -1 u
  5. w = Q v
  6. x = w
  7. 要反转ULR,您可以使用\来识别它们是三角形(和R)矩阵的对角线 - 作为监控应该确认,并为他们使用适当的平凡解算器。

    因此,用更密集和matlab编写的方式:x = Q*(U\(L\(P*(R\x))));

    执行此操作将正是求解器\内部发生的事情,只有一个因子分解,正如您所问。

    但是,正如评论中所述,大量反演计算N = M -1 一次变得更快,然后只做一个简单的矩阵向量乘法,比上面解释的过程简单得多。初始计算inv(M)更长,has some limitations,因此如果您的矩阵,这种权衡取决于属性。