并行过程中无序的标准输出

时间:2019-07-08 12:03:11

标签: python

我正在遍历一个文件(所谓的multifasta文件,其中每个记录以>开头,并且具有ACTGC等。

>
ACTG

然后,我将这些记录传递给一个调用外部shell脚本的函数。问题是输出全部混乱了,所以我想要的是g的输出,然后是a and的输出。当我使用一个简单的循环时,这种情况发生的很好,但是我想对200万条记录*数百条记录进行处理,因此编写了一个函数来对此并行化。

我最初在循环中遇到相同的问题:

for record in fasta:

    f=str(record.seq)
    g=str(record.id)
        print(g)
    a=subprocess.call(["bash","do.sh", f,g])
subprocess.call(["/well/bag/users/lipworth/cobs/build/src/cobs","query","-i","out.cobs_compact","-l","1","-t","0.1", f])
cat do.sh
stdbuf -o0 -e0 echo $2
stdbuf -o0 -e0 /well/bag/users/lipworth/cobs/build/src/cobs query -t .1 -i out.cobs_compact -l 1 --load-complete $1

添加stdbuf位可以解决问题。

这是我当前的代码:

import subprocess
from Bio import SeqIO
from joblib import Parallel, delayed
import multiprocessing

file=open('BLC.fa','rU')

fasta=SeqIO.parse(file,"fasta")

def cobbler(record):
    f=str(record.seq)
    g=str(record.id)
    print(g)
    a= subprocess.call(["stdbuf", "-o0", "-e0","bash","do.sh", f, g])


num_cores=multiprocessing.cpu_count()

results=Parallel(n_jobs=num_cores)(delayed(cobbler)(record) for record in fasta)

编辑: 现在我有了这个:

import subprocess
import re
import sys
from Bio import SeqIO
from joblib import Parallel, delayed
import multiprocessing

file=open('BLC.fa','rU')

fasta=SeqIO.parse(file,"fasta")
outfile=open('out','wb')

def cobbler(record):
        outfile=open('out','wb')
        f=str(record.seq)
        g=str(record.id)

        a= subprocess.Popen(["bash","do.sh", f, g],stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        out, err = a.communicate()

        return out



def mp_handler():
        p = multiprocessing.Pool(4)
        with open('out.txt', 'w') as f:
                for result in p.imap(cobbler, fasta):
                        print(result)
                        f.write('%s' % result)

if __name__ =='__main__':
        mp_handler()

除了没有任何内容保存到out.txt文件外,其行为与预期的一样-为什么?

0 个答案:

没有答案