I'm trying to implement Smith-Waterman alignment in parallel using Julia (see: Figure 1 of http://www.cs.virginia.edu/~rl6sf/paper_dump/2011:12:33:22.pdf), but the algorithm is running much slower in Julia than the serial version. I'm using shared arrays to do this and figure I am doing something silly that is making the code run slow. Could someone take a look and see if my code is optimized as possible? The parallel version should run faster than in serial….
The basic concept of it is to compute the anti-diagonal elements of a matrix in parallel from the upper left to lower right corner and to update them. I'm trying to use 32 cores on a shared array machine to do this. I have a SharedArray matrix that I am using to do this and am computing the elements of each anti-diagonal in parallel as shown below. The while loops in the spSW function submit tasks to workers in sync for each anti-diagonal using the helper function shared_get_score(). The main goal of this function is to fill in each element in the shared arrays "matrix" and "path".
source $CALDB/software/tools/caldbinit.sh
The other helper functions are:
function spSW(seq1,seq2,p)
indel = -1
match = 2
seq1 = "^$seq1"
seq2 = "^$seq2"
col = length(seq1)
row = length(seq2)
wl = workers()
matrix,path = shared_initialize_path(seq1,seq2)
for j = 2:col
jcol = j
irow = 2
@sync begin
count = 0
while jcol > 1 && irow < row + 1
#println(j," ",irow," ",jcol)
if seq1[jcol] == seq2[irow]
equal = true
else
equal = false
end
w = wl[(count % p) + 1]
@async remotecall_wait(w,shared_get_score!,matrix,path,equal,indel,match,irow,jcol)
jcol -= 1
irow += 1
count += 1
end
end
end
for i = 3:row
jcol = col
irow = i
@sync begin
count = 0
while irow < row+1 && jcol > 1
#println(j," ",irow," ",jcol)
if seq1[jcol] == seq2[irow]
equal = true
else
equal = false
end
w = wl[(count % p) + 1]
@async remotecall_wait(w,shared_get_score!,matrix,path,equal,indel,match,irow,jcol)
jcol -= 1
irow += 1
count += 1
end
end
end
return matrix,path
end
Does anyone see an obvious way to make this run faster? Right now it's about 10 times slower than the serial version.