我是 Python 新手,我正在寻找一种方法来加速我编写的以下函数:
def incoming_Wave_Vel(geom,t,phi):
x = geom[:,0].reshape(1,geom.shape[0])
y = geom[:,1].reshape(1,geom.shape[0])
z = geom[:,2].reshape(1,geom.shape[0])
q1 = k*(h+z)
q2 = omega*t-Kx*x-Ky*y+phi[:,0].reshape(phi.shape[0],1)
u = AOmega.T @ (np.cosh(q1)/np.sinh(k*h)*np.sin(q2))
w = AOmega.T @ (np.sinh(q1)/np.sinh(k*h)*np.cos(q2))
return np.vstack((u*np.cos(th),u*np.sin(th),w))*RampFun(t)
AOmega, k, Kx, Ky, omega
和 phi
是维度为 [n,1] 的数组,而 x,y,z
的维度为 [1,m]。我使用点积 @
来避免 sum 函数,但性能几乎相同。
我尝试使用 numba,但到目前为止所做的所有测试都失败了。
有什么建议可以改进代码吗?谢谢
我在下面发布了尝试使用 numba 并行化该函数:
from numba import njit, prange
@njit(fastmath=True, parallel=True)
def incoming_Wave_Vel_parallel(x,y,z,t,phi):
u = np.zeros_like(x)
w = np.zeros_like(x)
n = len(omega)
for i in prange(n):
q1 = k[i]*(h+z)
q2 = omega[i]*t-Kx[i]*x-Ky[i]*y+phi[i]
u += AOmega[i]*np.cosh(q1)/np.sinh(k[i]*h)*np.sin(q2)
w += AOmega[i]*np.sinh(q1)/np.sinh(k[i]*h)*np.cos(q2)
return np.vstack((u*np.cos(th),u*np.sin(th),w))
但是这个版本比串行版本慢。
答案 0 :(得分:0)
我们可以通过去除 sin 和 sinh 的冗余计算来优化 incoming_Wave_Vel
,将性能提高约 28%。
def incoming_Wave_Vel(t,x,y,z):
q1 = k*(h+z)
q2 = omega*t-Kx*x-Ky*y+phi_shaped
sin = np.sin(q2)
cos = np.cos(q2)
cste = (np.cosh(q1)/np.sinh(k*h))*sin
u = AOmega.T @ cste*sin
w = AOmega.T @ cste*cos
return np.vstack((u*np.cos(th),u*np.sin(th),w))*t
通过使用 multiprocessing
并行化操作可以进一步提高性能,与原来相比实现了 52% 的性能提升:
from multiprocessing import ThreadPool
times = np.linspace(0,1,200)
pool = ThreadPool()
res = pool.map(lambda i: incoming_Wave_Vel(times[i], x[i],y[i],z[i]), range(len(times)))
用于时序比较的代码:
import numpy as np
from multiprocessing.pool import ThreadPool
n, m = 237, 420
time_steps = 200
times = np.linspace(0,1,200)
AOmega, k, Kx, Ky, omega, phi = [np.random.random((n,1)) for _ in range(6)]
h, th = 1, 1
x = np.random.random((time_steps,1,m))
y = np.random.random((time_steps,1,m))
z = np.random.random((time_steps,1,m))
phi_shaped = phi[:,0].reshape(phi.shape[0],1)
def incoming_Wave_Vel(t,x,y,z):
q1 = k*(h+z)
q2 = omega*t-Kx*x-Ky*y+phi_shaped
sin = np.sin(q2)
cos = np.cos(q2)
cste = (np.cosh(q1)/np.sinh(k*h))*sin
u = AOmega.T @ cste*sin
w = AOmega.T @ cste*cos
return np.vstack((u*np.cos(th),u*np.sin(th),w))*t
def incoming_Wave_Vel_original(t,x,y,z,phi):
q1 = k*(h+z)
q2 = omega*t-Kx*x-Ky*y+phi[:,0].reshape(phi.shape[0],1)
u = AOmega.T @ (np.cosh(q1)/np.sinh(k*h)*np.sin(q2))
w = AOmega.T @ (np.sinh(q1)/np.sinh(k*h)*np.cos(q2))
return np.vstack((u*np.cos(th),u*np.sin(th),w))*t
原文:
%%timeit
res = []
for i,t in enumerate(times):
incoming_Wave_Vel_original(t, x[i], y[i], z[i], phi)
并行和优化:
%%timeit
pool = ThreadPool()
res = pool.map(lambda i: incoming_Wave_Vel(times[i], x[i],y[i],z[i]), range(len(times)))