随机Schrödinger方程中的代码优化。高效矩阵乘法

时间:2019-05-14 18:03:46

标签: optimization julia matrix-multiplication

在Julia中计算Stochastic Schrödinger(SSE)方程时遇到瓶颈。该算法的主要瓶颈当然是时间步长,对我来说10e-3就足够了,但我需要tmax~200。实现的数量(~100);我想在这里问一下:每个时间步所需的矩阵向量乘法量。最后,对于Ntot=50,该程序运行了大约半小时,但是对于300,该程序可能会花费大约十个小时。

执行此计算的主要代码是:

function avAPsi(Psi::Array{Complex128}, A::Array{Float64})
    return Psi'*A*Psi
end

function D1(gamma::Float64, Psi::Array{Complex128}, Htot::Array{Float64}, a_mat::Array{Float64}, term1::Complex128, number_operator::Array{Float64})                                                                           
    dummy = term1*a_mat - number_operator
    return -1im*Htot*Psi + 0.5*gamma*dummy*Psi - 0.125*gamma*term1^2*Psi
end

function D2(gamma::Float64, a_mat::Array{Float64}, term1::Complex128, Psi::Array{Complex128})
    return sqrt(gamma)*a_mat*Psi - 0.5*sqrt(gamma)*term1*Psi
end

# ------------------------------------------------------------------------
delta_t = 0.001
wavef_t = Array{Complex128}(Ntot)
dummy = copy(wavef_t)
psiApsi = 0.0
for ii in 1:tsteps
     wavef_t[:] = 0.0
     DW = sqrt(delta_t)*randn()
     # Terms in Runge-Kutta                                                                                   
     psiApsi = avAPsi(init_st, A)
     Psi1 = D1(gamma1, init_st, Htot, a_mat, psiApsi, number_operator)
     dummy = init_st + 0.5*delta_t*Psi1
     psiApsi = avAPsi(dummy, A)
     Psi2 = D1(gamma1, dummy, Htot, a_mat, psiApsi, number_operator)
     dummy = init_st + 0.5*delta_t*Psi2
     psiApsi = avAPsi(dummy, A)
     Psi3 = D1(gamma1, dummy, Htot, a_mat, psiApsi, number_operator)
     dummy = init_st + delta_t*Psi3
     psiApsi = avAPsi(dummy, A)
     Psi4 = D1(gamma1, dummy, Htot, a_mat, psiApsi, number_operator)
     factor2 = D2(gamma1, a_mat, psiApsi, init_st)
     wavef_t = init_st + (1.0/6.0)*(Psi1 + 2.0*Psi2 + 2.0*Psi3 + Psi4)*delta_t + factor2*DW
     wavef_t = wavef_t/norm(wavef_t)                                                   
     init_st = copy(wavef_t)     
end # tsteps 

因此SSE的算法本质上是解决随机微分 方程使用四阶Runge-Kutta。所以在每个时间步骤中,我必须计算 包含矩阵向量乘法的5个项。如何改善呢?

0 个答案:

没有答案