我正在运行一个模拟,它有两个方法,每次迭代都会调用模拟并占用大部分计算时间。
方法是:
def calculate_macrovars(f, n, ns, ex, ey, dens, velx, vely):
fd = f[:,2:n+2,2:n+2]
dens[:,:] = fd[0,:,:]
velx[:,:] = 0.
vely[:,:] = 0.
for s in range(1,ns):
fds = fd[s,:,:]
dens += fds
velx += ex[s]*fds
vely += ey[s]*fds
dens[:,:] = dens
velx[:,:] /= dens
vely[:,:] /= dens
和
def do_relaxation(f, ftmp, n, ns, ws, ex, ey, cssq, omega, dens, velx, vely):
dist = f[:,2:n+2,2:n+2]
dist_tmp = ftmp[:,2:n+2,2:n+2]
vv = (velx*velx + vely*vely)/cssq
for s in range(ns):
ev = (ex[s]*velx + ey[s]*vely)/cssq
dist_eq = ws[s]*dens*(1 + ev + 0.5*ev*ev - 0.5*vv)
dist_tmp[s,:,:] = (1. - omega)*dist[s,:,:] + omega*dist_eq
这些函数的参数是3d数组f
和ftmp
(shape = nsxnxn
dtype = np.float32
),标量n
和{{1} }(ns
),1d数组int
(shape = ws
dtype = nsx1
),1d数组np.float32
和ex
(shape = { {1}} dtype = ey
),标量nsx1
和np.int8
(cssq
),2d数组omega
,float
,{{1 }}(shape = dens
dtype = velx
)
我描述了一个模拟运行,发现每个方法分别占用了25%和69%的时间,共计94%的计算时间,如下所示(我已经修剪了轮廓以删除所有行,这些行占用的时间可以忽略不计运行):
vely
分析nxn
产生:
np.float32
分析Line # Hits Time Per Hit % Time Line Contents
==============================================================
2 def tgv_simulation():
84 4981 16577491 3328.1 24.9 calculate_macrovars()
89 4981 45899937 9215.0 68.9 do_relaxation()
91 4981 3614229 725.6 5.4 do_streaming()
Total time: 66.6111 s
产生:
calculate_macrovars()
我应采用哪种优化策略来提高这两种方法的性能?
更新:向量化Line # Hits Time Per Hit % Time Line Contents
==============================================================
2 def calculate_macrovars():
5 4981 16353 3.3 0.1 fd = f[:,2:n+2,2:n+2]
6 4981 376653 75.6 2.2 dens[:,:] = fd[0,:,:]
7 4981 256929 51.6 1.5 velx[:,:] = 0.
8 4981 246298 49.4 1.4 vely[:,:] = 0.
9 44829 41229 0.9 0.2 for s in range(1,ns):
10 39848 47308 1.2 0.3 fds = fd[s,:,:]
11 39848 3739018 93.8 21.6 dens += fds
12 39848 5699723 143.0 32.9 velx += ex[s]*fds
13 39848 5428619 136.2 31.4 vely += ey[s]*fds
15 4981 723137 145.2 4.2 velx[:,:] /= dens
16 4981 719715 144.5 4.2 vely[:,:] /= dens
Total time: 17.3031 s
:
do_relaxation()
异型:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
19 def do_relaxation():
22 4981 17183 3.4 0.0 dist = f[:,2:n+2,2:n+2]
23 4981 6700 1.3 0.0 dist_tmp = ftmp[:,2:n+2,2:n+2]
24 4981 1510181 303.2 3.3 vv = (velx*velx + vely*vely)/cssq
25 49810 61256 1.2 0.1 for s in range(ns):
26 44829 12988964 289.7 28.7 ev = (ex[s]*velx + ey[s]*vely)/cssq
27 44829 18644169 415.9 41.2 dist_eq = ws[s]*dens*(1 + ev + 0.5*ev*ev - 0.5*vv)
28 44829 11993650 267.5 26.5 dist_tmp[s,:,:] = (1. - omega)*dist[s,:,:] + omega*dist_eq
Total time: 45.2221 s
与calculate_macrovars
的循环版本大致相同的性能。