在python中混合线程/进程的问题

时间:2012-09-27 13:20:34

标签: python multithreading queue multiprocessing

我刚开始尝试做多线程/多处理器的东西,并遇到一些问题。 我想要做的是生成一些应该从远程数据库下载的数据请求。它们存储在Queue.Queue中(我们称之为in_q)。一旦我生成了所有请求,我就会启动一个有限数量的线程类,它将in_q和另一个Queue(out_q)作为输入。然后我从q_in获取()作业并将结果输出到q_out。所以这部分是IO绑定的,因此我认为线程是一个不错的选择。 q_out的结果由进程池消耗,这些进程对进程执行一些操作。这部分是受CPU限制的,因此我认为流程是一个不错的选择。

现在这似乎工作正常,除了我遇到了一个奇怪的行为,我已经在下面演示了。

import threading
import Queue
import multiprocessing as mp

class TestThread(threading.Thread):

    def __init__ ( self, threadnr,resultPool,jobPool ):
      self.threadnr = threadnr
      self.resultPool = resultPool
      self.jobPool = jobPool
      threading.Thread.__init__ ( self )    

    def run(self):
        while True:
            job = self.jobPool.get()
            if job != None:
                for a in range(10):
                    for i in xrange(1000000):
                        pass
                print "Thread nr %d finished job %d" % (self.threadnr,job)
                self.resultPool.put([self.threadnr,job+1])
                self.jobPool.task_done()           

def test(i):
    print mp.current_process().name,"test",i
    return mp.current_process().name,"test",i

if __name__ == '__main__':        
    q_in = Queue.Queue()   
    q_out = Queue.Queue() 
    nr_jobs = 20
    res = []
    nr_threads = 4
    threads = []

    for i in range(nr_jobs):
        q_in.put(i)

    for i in range(nr_threads):
        t = TestThread(i,q_out,q_in)
        t.start()
        threads.append(t)

    p_pool = mp.Pool(4)   

    for i in range(nr_jobs):
        job = q_out.get(block=True)
        print "Got job",job
        res.append(p_pool.apply_async(test,(job,)))

    p_pool.close()
    p_pool.join()

    for r in res:
        print r.get()

    for t in threads:
        t.join()

这个输出是:

Thread nr 2 finished job 2
Got job [2, 3]
Thread nr 0 finished job 0
Got job [0, 1]
Thread nr 1 finished job 1
Got job [1, 2]
Thread nr 3 finished job 3
Got job [3, 4]
Thread nr 2 finished job 4
Got job Thread nr 0 finished job 5[
2, 5]
Got job [0, 6]
Thread nr 1 finished job 6
Got job [1, 7]
Thread nr 3 finished job 7
Got job [3, 8]
Thread nr 2 finished job 8
Got job [2, 9]
Thread nr 0 finished job 9
Got job [0, 10]
PoolWorker-4 test [1, 2]
PoolWorker-4 test [1, 7]
PoolWorker-3 test [3, 4]
PoolWorker-3 test [3, 8]
PoolWorker-2 test [0, 1]
PoolWorker-2 test [0, 6]
PoolWorker-2 test [0, 10]
PoolWorker-1 test [2, 3]
PoolWorker-1 test [2, 5]
PoolWorker-1 test [2, 9]
('PoolWorker-1', 'test', [2, 3])
('PoolWorker-2', 'test', [0, 1])
('PoolWorker-4', 'test', [1, 2])
('PoolWorker-3', 'test', [3, 4])
('PoolWorker-1', 'test', [2, 5])
('PoolWorker-2', 'test', [0, 6])
('PoolWorker-4', 'test', [1, 7])
('PoolWorker-3', 'test', [3, 8])
('PoolWorker-1', 'test', [2, 9])
('PoolWorker-2', 'test', [0, 10])

这是一个测试程序,在很大程度上像我的真实程序一样工作。我觉得奇怪的是,即使线程需要相对较长的时间来完成,但是在线程完成所有工作之前,不会打印出进程。看起来似乎连续消耗了作业,但是在完成所有线程之后才会显示进程的输出。

在这个例子中,它是相当无害的(如果讨厌),但在我的真实程序中...输出的排队似乎导致内存错误,因为进程的所有输出都被延迟,直到最后一个线程完成。

作为一个插件问题,混合线程和进程是一个好主意,还是应该坚持一个或另一个?

我很感激有关此事的任何想法。

0 个答案:

没有答案