是否有可能加速发展中的Julia代码

时间:2015-08-12 15:09:01

标签: performance julia

作为我脚本的一部分。我有一些代码如下(devectorized julia -as possible as)

for kk=1:n # Main loop
    for j=1:m
        rhs[j]=2*u0[j]-alf*dt*u1[j]-2*mu*u2[j];    
    end
    c=lhs\rhs'; #c: coefficients to be obtained

    u2=c'*h;
    u1=c'*p.-c'*f;
    u0=c'*Q-c'*f*x;

    for j=1:m
        for i=1:m
            lhs[j,i]=2*(Q[i,j]-x[j]*f[i])+alf*dt*(p[i,j]-f[i])+eps*dt*(Q[i,j]-x[j]*f[i])*u1[j]+eps*u0[j]*dt*(p[i,j]-f[i])-2*mu*h[i,j];
        end
    end

end

其中h,p,Q,lhs是mxm矩阵; u0,u1,u2,rhs和x是1xm数组,alf,dt,mu,eps是标量常数,f,c是mx1数组。我在脚本的开头预先分配了矩阵和数组。上述代码的矢量化形式如下

for kk=1:n # Main loop

    rhs=2*u0-alf*dt*u1-2*mu*u2;    

    c=lhs\rhs'; #c coefficients to be obtained

    u2=c'*h;
    u1=c'*p.-c'*f;
    u0=c'*Q-c'*f*x;

    lhs=2*(Q-f*x)+alf*dt*(p.-f)+eps*dt*(Q-f*x).*u1+eps*dt*u0.*(p.-f)-2*mu*h;
    lhs=lhs';

end

例如,对于n = 100和m = 64,经过的时间如下:

devectorized julia:1.8秒

矢量化茱莉亚:0.2秒

矢量化numpy:0.04秒

矢量化的julia代码比devectorized julia代码快9倍,矢量化python代码比矢量化julia代码快约5倍。

对于n = 500且m = 256

devectorized julia:85.589233013秒

vectorized julia:8.232898003秒

vectorized numpy:1.62000012398秒

我的问题:在这种情况下,是否有可能提高朱莉娅的表现?

1 个答案:

答案 0 :(得分:2)

我认为也可以像这样开发u0,u1,u2的计算:

function vectorized()
    m = [1.0 2.0 3.0; 1.0 2.0 3.0; 1.0 2.0 3.0]
    c = [1.0, 2.0, 3.0]

    for i in 1:100000
        x1 = c'*m
        x2 = c'*m
        x3 = c'*m
    end

    return
end

function vectime(N)
    timings = Array(Float64, N)

    # Force compilation
    vectorized()

    for itr in 1:N
        timings[itr] = @elapsed vectorized()
    end

    return timings
end

println("vectorized=",mean(vectime(20)))

function devectorized()
    m = [1.0 2.0 3.0; 1.0 2.0 3.0; 1.0 2.0 3.0]
    c = [1.0, 2.0, 3.0]
    x1 = [0.0, 0.0, 0.0]
    x2 = [0.0, 0.0, 0.0]
    x3 = [0.0, 0.0, 0.0]
    mx = 3
    for i in 1:100000
        for k in 1:mx
            for kk in 1:mx
              x1[k]=x1[k]+c[k]*m[k,kk];
              x2[k]=x2[k]+c[k]*m[k,kk];
              x3[k]=x3[k]+c[k]*m[k,kk];
            end
        end
    end
    return
end

function dvectime(N)
    timings = Array(Float64, N)

    # Force compilation
    devectorized()

    for itr in 1:N
        timings[itr] = @elapsed devectorized()
    end

    return timings
end

println("devectorized=",mean(dvectime(20)))

以上代码结果:

vectorized=0.17680755404999998
devectorized=0.00441064295