我执行了许多迭代求解线性方程组:Mx=b
大而稀疏的M.
M不会在迭代之间发生变化,但b会发生变化。我尝试了几种方法,到目前为止发现反斜杠\ mldivide是最有效和准确的。
以下代码与我正在做的非常相似:
for ii=1:num_iter
x = M\x;
x = x+dx;
end
现在我想通过利用M固定的事实来进一步加速计算。
设置标志spparms('spumoni',2)
可以获得解算器算法的详细信息。
我运行了以下代码:
spparms('spumoni',2);
x = M\B;
输出(监控):
sp\: bandwidth = 2452+1+2452.
sp\: is A diagonal? no.
sp\: is band density (0.01) > bandden (0.50) to try banded solver? no.
sp\: is A triangular? no.
sp\: is A morally triangular? no.
sp\: is A a candidate for Cholesky (symmetric, real positive diagonal)? no.
sp\: use Unsymmetric MultiFrontal PACKage with Control parameters:
UMFPACK V5.4.0 (May 20, 2009), Control:
Matrix entry defined as: double
Int (generic integer) defined as: UF_long
0: print level: 2
1: dense row parameter: 0.2
"dense" rows have > max (16, (0.2)*16*sqrt(n_col) entries)
2: dense column parameter: 0.2
"dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
3: pivot tolerance: 0.1
4: block size for dense matrix kernels: 32
5: strategy: 0 (auto)
6: initial allocation ratio: 0.7
7: max iterative refinement steps: 2
12: 2-by-2 pivot tolerance: 0.01
13: Q fixed during numerical factorization: 0 (auto)
14: AMD dense row/col parameter: 10
"dense" rows/columns have > max (16, (10)*sqrt(n)) entries
Only used if the AMD ordering is used.
15: diagonal pivot tolerance: 0.001
Only used if diagonal pivoting is attempted.
16: scaling: 1 (divide each row by sum of abs. values in each row)
17: frontal matrix allocation ratio: 0.5
18: drop tolerance: 0
19: AMD and COLAMD aggressive absorption: 1 (yes)
The following options can only be changed at compile-time:
8: BLAS library used: Fortran BLAS. size of BLAS integer: 8
9: compiled for MATLAB
10: CPU timer is ANSI C clock (may wrap around).
11: compiled for normal operation (debugging disabled)
computer/operating system: Microsoft Windows
size of int: 4 UF_long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes)
sp\: is UMFPACK's symbolic LU factorization (with automatic reordering) successful? yes.
sp\: is UMFPACK's numeric LU factorization successful? yes.
sp\: is UMFPACK's triangular solve successful? yes.
sp\: UMFPACK Statistics:
UMFPACK V5.4.0 (May 20, 2009), Info:
matrix entry defined as: double
Int (generic integer) defined as: UF_long
BLAS library used: Fortran BLAS. size of BLAS integer: 8
MATLAB: yes.
CPU timer: ANSI clock ( ) routine.
number of rows in matrix A: 3468
number of columns in matrix A: 3468
entries in matrix A: 60252
memory usage reported in: 16-byte Units
size of int: 4 bytes
size of UF_long: 8 bytes
size of pointer: 8 bytes
size of numerical entry: 8 bytes
strategy used: symmetric
ordering used: amd on A+A'
modify Q during factorization: no
prefer diagonal pivoting: yes
pivots with zero Markowitz cost: 1284
submatrix S after removing zero-cost pivots:
number of "dense" rows: 0
number of "dense" columns: 0
number of empty rows: 0
number of empty columns 0
submatrix S square and diagonal preserved
pattern of square submatrix S:
number rows and columns 2184
symmetry of nonzero pattern: 0.904903
nz in S+S' (excl. diagonal): 62184
nz on diagonal of matrix S: 2184
fraction of nz on diagonal: 1.000000
AMD statistics, for strict diagonal pivoting:
est. flops for LU factorization: 2.76434e+007
est. nz in L+U (incl. diagonal): 306216
est. largest front (# entries): 31329
est. max nz in any column of L: 177
number of "dense" rows/columns in S+S': 0
symbolic factorization defragmentations: 0
symbolic memory usage (Units): 174698
symbolic memory usage (MBytes): 2.7
Symbolic size (Units): 9196
Symbolic size (MBytes): 0
symbolic factorization CPU time (sec): 0.00
symbolic factorization wallclock time(sec): 0.00
matrix scaled: yes (divided each row by sum of abs values in each row)
minimum sum (abs (rows of A)): 1.00000e+000
maximum sum (abs (rows of A)): 9.75375e+003
symbolic/numeric factorization: upper bound actual %
variable-sized part of Numeric object:
initial size (Units) 149803 146332 98%
peak size (Units) 1037500 202715 20%
final size (Units) 787803 154127 20%
Numeric final size (Units) 806913 171503 21%
Numeric final size (MBytes) 12.3 2.6 21%
peak memory usage (Units) 1083860 249075 23%
peak memory usage (MBytes) 16.5 3.8 23%
numeric factorization flops 5.22115e+008 2.59546e+007 5%
nz in L (incl diagonal) 593172 145107 24%
nz in U (incl diagonal) 835128 154044 18%
nz in L+U (incl diagonal) 1424832 295683 21%
largest front (# entries) 348768 30798 9%
largest # rows in front 519 175 34%
largest # columns in front 672 177 26%
initial allocation ratio used: 0.309
# of forced updates due to frontal growth: 1
number of off-diagonal pivots: 0
nz in L (incl diagonal), if none dropped 145107
nz in U (incl diagonal), if none dropped 154044
number of small entries dropped 0
nonzeros on diagonal of U: 3468
min abs. value on diagonal of U: 4.80e-002
max abs. value on diagonal of U: 1.00e+000
estimate of reciprocal of condition number: 4.80e-002
indices in compressed pattern: 13651
numerical values stored in Numeric object: 295806
numeric factorization defragmentations: 0
numeric factorization reallocations: 0
costly numeric factorization reallocations: 0
numeric factorization CPU time (sec): 0.05
numeric factorization wallclock time (sec): 0.00
numeric factorization mflops (CPU time): 552.22
solve flops: 1.78396e+006
iterative refinement steps taken: 1
iterative refinement steps attempted: 1
sparse backward error omega1: 1.80e-016
sparse backward error omega2: 0.00e+000
solve CPU time (sec): 0.00
solve wall clock time (sec): 0.00
total symbolic + numeric + solve flops: 2.77385e+007
注意以下几行:
numeric factorization flops 5.22115e+008 2.59546e+007 5%
solve flops: 1.78396e+006
total symbolic + numeric + solve flops: 2.77385e+007
它表明M的因式分解需要2.59546e + 007 / 2.77385e + 007 =解决方程所需总时间的93.6%。
我想在迭代之外预先计算分解,然后只运行占用大约6.5%CPU时间的最后一个阶段。
我知道如何计算分解([L,U,P,Q,R] = lu(M);
),但我不知道如何利用其输出作为求解器的输入。
我想按照以下精神运行:
[L,U,P,Q,R] = lu(M);
for ii=1:num_iter
dx = solve_pre_factored(M,P,Q,R,x);
x = x+dx;
end
有没有办法在Matlab中做到这一点?
答案 0 :(得分:3)
你必须问自己LU分解中的所有这些矩阵是什么。
正如the documentation所述:
[L,U,P,Q,R] = lu(A)返回单位下三角矩阵L,上三角矩阵U,置换矩阵P和Q,以及对角缩放矩阵R,使得P *(R' A) Q = L U用于稀疏非空A.通常,但并非总是如此,行缩放导致更稀疏且更稳定的分解。语句lu(A,'matrix')返回相同的输出值。
因此,在更多数学术语中,我们PR
-1
AQ = LU
,因此A = RP
-1
LUQ
-1
然后可以通过以下步骤重写x = M\x
:
y = R
-1
x
z = P y
u = L
-1
z
v = U
-1
u
w = Q v
x = w
要反转U
,L
和R
,您可以使用\
来识别它们是三角形(和R
)矩阵的对角线 - 作为监控应该确认,并为他们使用适当的平凡解算器。
因此,用更密集和matlab编写的方式:x = Q*(U\(L\(P*(R\x))));
执行此操作将正是求解器\
内部发生的事情,只有一个因子分解,正如您所问。
但是,正如评论中所述,大量反演计算N = M
-1
一次变得更快,然后只做一个简单的矩阵向量乘法,比上面解释的过程简单得多。初始计算inv(M)
更长,has some limitations,因此如果您的矩阵,这种权衡取决于属性。