多次处理在几百个作业后挂起

时间:2018-02-23 21:30:13

标签: python pysam

我正在尝试将此问题用于我的文件处理: Python multiprocessing safely writing to a file

这是我对代码的修改:

def listener(q):
    '''listens for messages on the q, writes to file. '''
    while 1:
        reads = q.get()
        if reads == 'kill':
            #f.write('killed')
            break
        for read in reads:
            out_bam.write(read)
        out_bam.flush()
    out_bam.close()

def fetch_reads(line, q):
    parts = line[:-1].split('\t')
    print(parts)
    start,end = int(parts[1])-1,int(parts[2])-1
    in_bam = pysam.AlignmentFile(args.bam, mode='rb')
    fetched = in_bam.fetch(parts[0], start, end)
    reads = [read for read in fetched if (read.cigarstring and read.pos >= start and read.pos < end and 'S' not in read.cigarstring)]
    in_bam.close()
    q.put(reads)
    return reads

#must use Manager queue here, or will not work
manager = mp.Manager()
q = manager.Queue()
if not args.threads:
    threads = 1
else:
    threads = int(args.threads)
pool = mp.Pool(threads+1)

#put listener to work first
watcher = pool.apply_async(listener, (q,))

with open(args.bed,'r') as bed:
    jobs = []
    cnt = 0
    for line in bed:
        # Fire off the read fetchings
        job = pool.apply_async(fetch_reads, (line, q))
        jobs.append(job)
        cnt += 1
        if cnt > 10000:
            break

# collect results from the workers through the pool result queue
for job in jobs: 
    job.get()
    print('get')

#now we are done, kill the listener
q.put('kill')
pool.close()

我在函数中打开和关闭文件的区别在于,否则我会从bgzip中获得异常错误。

首先,打印(部件)和打印('get')可以互换打印(或多或少),然后“get”的打印越来越少。最终代码挂起,没有打印任何内容(所有部分都打印出来,但“得到”根本不再打印)。输出文件保持零字节。

任何人都可以伸出援手吗?干杯!

0 个答案:

没有答案