Why is my Julia shared array code running so slow?

时间:2015-12-08 19:25:18

标签: performance parallel-processing julia

I'm trying to implement Smith-Waterman alignment in parallel using Julia (see: Figure 1 of http://www.cs.virginia.edu/~rl6sf/paper_dump/2011:12:33:22.pdf), but the algorithm is running much slower in Julia than the serial version. I'm using shared arrays to do this and figure I am doing something silly that is making the code run slow. Could someone take a look and see if my code is optimized as possible? The parallel version should run faster than in serial….

The basic concept of it is to compute the anti-diagonal elements of a matrix in parallel from the upper left to lower right corner and to update them. I'm trying to use 32 cores on a shared array machine to do this. I have a SharedArray matrix that I am using to do this and am computing the elements of each anti-diagonal in parallel as shown below. The while loops in the spSW function submit tasks to workers in sync for each anti-diagonal using the helper function shared_get_score(). The main goal of this function is to fill in each element in the shared arrays "matrix" and "path".

source $CALDB/software/tools/caldbinit.sh

The other helper functions are:

function spSW(seq1,seq2,p)
    indel = -1
    match = 2

    seq1 = "^$seq1"
    seq2 = "^$seq2"

    col = length(seq1)
    row = length(seq2)

    wl = workers()

    matrix,path = shared_initialize_path(seq1,seq2)

    for j = 2:col
        jcol = j
        irow = 2
        @sync begin
            count = 0
            while jcol > 1 && irow < row + 1
                #println(j," ",irow," ",jcol)
                if seq1[jcol] == seq2[irow]
                    equal = true
                else
                    equal = false
                end
                w = wl[(count % p) + 1]
                @async remotecall_wait(w,shared_get_score!,matrix,path,equal,indel,match,irow,jcol)
                jcol -= 1
                irow += 1
                count += 1
            end
        end
    end

    for i = 3:row
        jcol = col
        irow = i
        @sync begin
            count = 0
            while irow < row+1 && jcol > 1
                #println(j," ",irow," ",jcol)
                if seq1[jcol] == seq2[irow]
                    equal = true
                else
                    equal = false
                end
                w = wl[(count % p) + 1]
                @async remotecall_wait(w,shared_get_score!,matrix,path,equal,indel,match,irow,jcol)
                jcol -= 1
                irow += 1
                count += 1
            end
        end
    end
    return matrix,path
end

Does anyone see an obvious way to make this run faster? Right now it's about 10 times slower than the serial version.

0 个答案:

没有答案