作为我脚本的一部分。我有一些代码如下(devectorized julia -as possible as)
for kk=1:n # Main loop
for j=1:m
rhs[j]=2*u0[j]-alf*dt*u1[j]-2*mu*u2[j];
end
c=lhs\rhs'; #c: coefficients to be obtained
u2=c'*h;
u1=c'*p.-c'*f;
u0=c'*Q-c'*f*x;
for j=1:m
for i=1:m
lhs[j,i]=2*(Q[i,j]-x[j]*f[i])+alf*dt*(p[i,j]-f[i])+eps*dt*(Q[i,j]-x[j]*f[i])*u1[j]+eps*u0[j]*dt*(p[i,j]-f[i])-2*mu*h[i,j];
end
end
end
其中h,p,Q,lhs是mxm
矩阵; u0,u1,u2,rhs和x是1xm
数组,alf,dt,mu,eps是标量常数,f,c是mx1
数组。我在脚本的开头预先分配了矩阵和数组。上述代码的矢量化形式如下
for kk=1:n # Main loop
rhs=2*u0-alf*dt*u1-2*mu*u2;
c=lhs\rhs'; #c coefficients to be obtained
u2=c'*h;
u1=c'*p.-c'*f;
u0=c'*Q-c'*f*x;
lhs=2*(Q-f*x)+alf*dt*(p.-f)+eps*dt*(Q-f*x).*u1+eps*dt*u0.*(p.-f)-2*mu*h;
lhs=lhs';
end
例如,对于n = 100和m = 64,经过的时间如下:
devectorized julia:1.8秒
矢量化茱莉亚:0.2秒
矢量化numpy:0.04秒
矢量化的julia代码比devectorized julia代码快9倍,矢量化python代码比矢量化julia代码快约5倍。
对于n = 500且m = 256
devectorized julia:85.589233013秒
vectorized julia:8.232898003秒
vectorized numpy:1.62000012398秒
我的问题:在这种情况下,是否有可能提高朱莉娅的表现?
答案 0 :(得分:2)
我认为也可以像这样开发u0,u1,u2
的计算:
function vectorized()
m = [1.0 2.0 3.0; 1.0 2.0 3.0; 1.0 2.0 3.0]
c = [1.0, 2.0, 3.0]
for i in 1:100000
x1 = c'*m
x2 = c'*m
x3 = c'*m
end
return
end
function vectime(N)
timings = Array(Float64, N)
# Force compilation
vectorized()
for itr in 1:N
timings[itr] = @elapsed vectorized()
end
return timings
end
println("vectorized=",mean(vectime(20)))
function devectorized()
m = [1.0 2.0 3.0; 1.0 2.0 3.0; 1.0 2.0 3.0]
c = [1.0, 2.0, 3.0]
x1 = [0.0, 0.0, 0.0]
x2 = [0.0, 0.0, 0.0]
x3 = [0.0, 0.0, 0.0]
mx = 3
for i in 1:100000
for k in 1:mx
for kk in 1:mx
x1[k]=x1[k]+c[k]*m[k,kk];
x2[k]=x2[k]+c[k]*m[k,kk];
x3[k]=x3[k]+c[k]*m[k,kk];
end
end
end
return
end
function dvectime(N)
timings = Array(Float64, N)
# Force compilation
devectorized()
for itr in 1:N
timings[itr] = @elapsed devectorized()
end
return timings
end
println("devectorized=",mean(dvectime(20)))
以上代码结果:
vectorized=0.17680755404999998
devectorized=0.00441064295