针对三对角系数矩阵优化A * x = B解

时间:2014-04-16 21:07:38

标签: python performance numpy matrix scipy

我有一个A*x = B形式的方程组,其中[A]是三对角系数矩阵。使用Numpy求解器numpy.linalg.solve我可以求解x的方程组。

请参阅下面的示例,了解如何开发tridiagonal [A] martix。 {B}向量,并求解x

# Solve system of equations with a tridiagonal coefficient matrix
# uses numpy.linalg.solve

# use Python 3 print function
from __future__ import print_function
from __future__ import division

# modules
import numpy as np
import time

ti = time.clock()

#---- Build [A] array and {B} column vector

m = 1000   # size of array, make this 8000 to see time benefits

A = np.zeros((m, m))     # pre-allocate [A] array
B = np.zeros((m, 1))     # pre-allocate {B} column vector

A[0, 0] = 1
A[0, 1] = 2
B[0, 0] = 1

for i in range(1, m-1):
    A[i, i-1] = 7   # node-1
    A[i, i] = 8     # node
    A[i, i+1] = 9   # node+1
    B[i, 0] = 2

A[m-1, m-2] = 3
A[m-1, m-1] = 4
B[m-1, 0] = 3

print('A \n', A)
print('B \n', B)

#---- Solve using numpy.linalg.solve

x = np.linalg.solve(A, B)     # solve A*x = B for x

print('x \n', x)

#---- Elapsed time for each approach

print('NUMPY time', time.clock()-ti, 'seconds')

所以我的问题涉及上面例子的两个部分:

  1. 由于我正在处理[A]的三对角矩阵,也称为带状矩阵,是否有更有效的方法来求解方程组而不是使用numpy.linalg.solve
  2. 另外,有没有更好的方法来创建三对角矩阵而不是使用for-loop
  3. 根据0.08 seconds函数,上述示例在Linux上大约time.clock()运行。

    numpy.linalg.solve函数工作正常,但我正在尝试找到一种方法,利用[A]的三对角形式,希望进一步加快解决方案,然后应用这种方法一个更复杂的例子。

4 个答案:

答案 0 :(得分:2)

有两个即时的性能改进(1)不使用循环,(2)使用scipy.linalg.solve_banded()

我会写代码更像

import scipy.linalg as la

# Create arrays and set values
ab = np.zeros((3,m))
b = 2*ones(m)
ab[0] = 9
ab[1] = 8
ab[2] = 7

# Fix end points
ab[0,1] = 2
ab[1,0] = 1
ab[1,-1] = 4
ab[2,-2] = 3
b[0] = 1
b[-1] = 3

return la.solve_banded ((1,1),ab,b)

可能有更优雅的方法来构建矩阵,但这很有效。

%timeit中使用ipython,原始代码在m = 1000时需要112 ms。对于m = 10,000,该代码需要2.94 ms,一个数量级更大的问题,但仍然快两个数量级!我没有耐心等待m = 10,000的原始代码。在原始的大多数时间可能是在构建数组,我没有测试这个。无论如何,对于大型数组,仅存储矩阵的非零值会更有效。

答案 1 :(得分:1)

有一个名为scipy.sparse的{​​{1}}矩阵类型可以很好地捕获矩阵的结构(它将存储3个数组,位于"位置" 0(对角线),1(上面)和-1(下面))。使用这种类型的矩阵,您可以尝试scipy.sparse.dia_matrix来解决。如果您的问题有一个确切的解决方案,它将被发现,否则它将找到最小二乘意义上的解决方案。

scipy.sparse.linalg.lsqr

然而,就利用三对角线结构而言,这可能不是完全最优的,可能存在使其更快的理论方法。此转换为您做的是将矩阵乘法费用减少到必要:仅使用3个波段。这与迭代求解器from scipy import sparse A_sparse = sparse.dia_matrix(A) ret_values = sparse.linalg.lsqr(A_sparse, C) x = ret_values[0] 结合应该已经产生了加速。

注意:我不是在提议lsqr,因为它会将您的矩阵转换为scipy.sparse.linalg.spsolve格式。但是,将csr替换为lsqr值得一试,尤其是因为spsolve可以绑定spsolve,请参阅相关的doc on spsolve。此外,可能有兴趣看一下this stackoverflow question and answer relating to UMFPACK

答案 2 :(得分:1)

您可以使用scipy.linalg.solveh_banded

编辑:您不能使用上述内容,因为您的矩阵不对称,我认为是。但是,正如上面评论中提到的,托马斯算法非常适合这个

a =       [7] * ( m - 2 ) + [3]
b = [1] + [8] * ( m - 2 ) + [4]
c = [2] + [9] * ( m - 2 )
d = [1] + [2] * ( m - 2 ) + [3]

# This is taken directly from the Wikipedia page also cited above
# this overwrites b and d
def TDMASolve(a, b, c, d):
    n = len(d) # n is the numbers of rows, a and c has length n-1
    for i in xrange(n-1):
        d[i+1] -= 1. * d[i] * a[i] / b[i]
        b[i+1] -= 1. * c[i] * a[i] / b[i]
    for i in reversed(xrange(n-1)):
        d[i] -= d[i+1] * c[i] / b[i+1]
    return [d[i] / b[i] for i in xrange(n)]

这段代码没有优化,也没有使用np,但如果我(或其他任何好人)都有时间,我会编辑它以便它做那些事情。对于m = 10000,它目前在~10ms处。

答案 3 :(得分:0)

这可能会有所帮助 有一个函数creates_tridiagonal,它将创建三对角矩阵。还有另一个函数可以根据SciPy solve_banded函数的要求将矩阵转换为对角线有序形式。

import numpy as np    

def lu_decomp3(a):
    """
    c,d,e = lu_decomp3(a).
    LU decomposition of tridiagonal matrix a = [c\d\e]. On output
    {c},{d} and {e} are the diagonals of the decomposed matrix a.
    """
    n = np.diagonal(a).size
    assert(np.all(a.shape ==(n,n))) # check if square matrix

    d = np.copy(np.diagonal(a)) # without copy (assignment destination is read-only) error is raised 
    e = np.copy(np.diagonal(a, 1))
    c = np.copy(np.diagonal(a, -1)) 

    for k in range(1,n):
        lam = c[k-1]/d[k-1]
        d[k] = d[k] - lam*e[k-1]
        c[k-1] = lam
    return c,d,e

def lu_solve3(c,d,e,b):
    """
    x = lu_solve(c,d,e,b).
    Solves [c\d\e]{x} = {b}, where {c}, {d} and {e} are the
    vectors returned from lu_decomp3.
    """
    n = len(d)
    y = np.zeros_like(b)

    y[0] = b[0]
    for k in range(1,n): 
        y[k] = b[k] - c[k-1]*y[k-1]

    x = np.zeros_like(b)
    x[n-1] = y[n-1]/d[n-1] # there is no x[n] out of range
    for k in range(n-2,-1,-1):
        x[k] = (y[k] - e[k]*x[k+1])/d[k]
    return x

from scipy.sparse import diags
def create_tridiagonal(size = 4):
    diag = np.random.randn(size)*100
    diag_pos1 = np.random.randn(size-1)*10
    diag_neg1 = np.random.randn(size-1)*10

    a = diags([diag_neg1, diag, diag_pos1], offsets=[-1, 0, 1],shape=(size,size)).todense()
    return a

a = create_tridiagonal(4)
b = np.random.randn(4)*10

print('matrix a is\n = {} \n\n and vector b is \n {}'.format(a, b))

c, d, e = lu_decomp3(a)
x = lu_solve3(c, d, e, b)

print("x from our function is {}".format(x))

print("check is answer correct ({})".format(np.allclose(np.dot(a, x), b)))


## Test Scipy
from scipy.linalg import solve_banded

def diagonal_form(a, upper = 1, lower= 1):
    """
    a is a numpy square matrix
    this function converts a square matrix to diagonal ordered form
    returned matrix in ab shape which can be used directly for scipy.linalg.solve_banded
    """
    n = a.shape[1]
    assert(np.all(a.shape ==(n,n)))

    ab = np.zeros((2*n-1, n))

    for i in range(n):
        ab[i,(n-1)-i:] = np.diagonal(a,(n-1)-i)

    for i in range(n-1): 
        ab[(2*n-2)-i,:i+1] = np.diagonal(a,i-(n-1))


    mid_row_inx = int(ab.shape[0]/2)
    upper_rows = [mid_row_inx - i for i in range(1, upper+1)]
    upper_rows.reverse()
    upper_rows.append(mid_row_inx)
    lower_rows = [mid_row_inx + i for i in range(1, lower+1)]
    keep_rows = upper_rows+lower_rows
    ab = ab[keep_rows,:]


    return ab

ab = diagonal_form(a, upper=1, lower=1) # for tridiagonal matrix upper and lower = 1

x_sp = solve_banded((1,1), ab, b)
print("is our answer the same as scipy answer ({})".format(np.allclose(x, x_sp)))
相关问题