我的例程下面列出了urllib2.Requests,并为每个请求生成一个新进程并将其触发。目的是为了异步速度,所以它都是一劳永逸(不需要响应)。问题是在下面的代码中生成的进程永远不会终止。所以经过其中几个盒子就可以了。上下文:Django Web应用程序。有什么帮助吗?
MP_CONCURRENT = int(multiprocessing.cpu_count()) * 2
if MP_CONCURRENT < 2: MP_CONCURRENT = 2
MPQ = multiprocessing.JoinableQueue(MP_CONCURRENT)
def request_manager(req_list):
try:
# put request list in the queue
for req in req_list:
MPQ.put(req)
# call processes on queue
worker = multiprocessing.Process(target=process_request, args=(MPQ,))
worker.daemon = True
worker.start()
# move on after queue is empty
MPQ.join()
except Exception, e:
logging.error(traceback.print_exc())
# prcoess requests in queue
def process_request(MPQ):
try:
while True:
req = MPQ.get()
dr = urllib2.urlopen(req)
MPQ.task_done()
except Exception, e:
logging.error(traceback.print_exc())
答案 0 :(得分:1)
也许我不对,但是
MP_CONCURRENT = int(multiprocessing.cpu_count()) * 2
if MP_CONCURRENT < 2: MP_CONCURRENT = 2
MPQ = multiprocessing.JoinableQueue(MP_CONCURRENT)
def request_manager(req_list):
try:
# put request list in the queue
pool=[]
for req in req_list:
MPQ.put(req)
# call processes on queue
worker = multiprocessing.Process(target=process_request, args=(MPQ,))
worker.daemon = True
worker.start()
pool.append(worker)
# move on after queue is empty
MPQ.join()
# Close not needed processes
for p in pool: p.terminate()
except Exception, e:
logging.error(traceback.print_exc())
# prcoess requests in queue
def process_request(MPQ):
try:
while True:
req = MPQ.get()
dr = urllib2.urlopen(req)
MPQ.task_done()
except Exception, e:
logging.error(traceback.print_exc())
答案 1 :(得分:0)
MP_CONCURRENT = int(multiprocessing.cpu_count()) * 2
if MP_CONCURRENT < 2: MP_CONCURRENT = 2
MPQ = multiprocessing.JoinableQueue(MP_CONCURRENT)
CHUNK_SIZE = 20 #number of requests sended to one process.
pool = multiprocessing.Pool(MP_CONCURRENT)
def request_manager(req_list):
try:
# put request list in the queue
responce=pool.map(process_request,req_list,CHUNK_SIZE) # function exits after all requests called and pool work ended
# OR
responce=pool.map_async(process_request,req_list,CHUNK_SIZE) #function request_manager exits after all requests passed to pool
except Exception, e:
logging.error(traceback.print_exc())
# prcoess requests in queue
def process_request(req):
dr = urllib2.urlopen(req)
这比你的代码快〜5-10倍
答案 2 :(得分:0)
将副“brocker”整合到django(例如rabbitmq
或类似的东西)。
答案 3 :(得分:0)
在一些摆弄(和睡个好觉)之后好吧,我相信我已经找到了问题(谢谢Eri,你是我需要的灵感)。僵尸进程的主要问题是我没有发出信号,说明这个过程已经完成(并且已经完成了),我(天真地)认为这两个进程都是自动进行多进程的。
有效的代码:
# function that will be run through the pool
def process_request(req):
try:
dr = urllib2.urlopen(req, timeout=30)
except Exception, e:
logging.error(traceback.print_exc())
# process killer
def sig_end(r):
sys.exit()
# globals
MP_CONCURRENT = int(multiprocessing.cpu_count()) * 2
if MP_CONCURRENT < 2: MP_CONCURRENT = 2
CHUNK_SIZE = 20
POOL = multiprocessing.Pool(MP_CONCURRENT)
# pool initiator
def request_manager(req_list):
try:
resp = POOL.map_async(process_request, req_list, CHUNK_SIZE, callback=sig_end)
except Exception, e:
logging.error(traceback.print_exc())
几点说明:
1)必须首先定义“map_async”(本例中为“process_request”)命中的函数(在全局声明之前)。
2)退出流程可能有更优雅的方式(建议欢迎)。
3)在这个例子中使用池确实是最好的(再次感谢Eri),因为“回调”功能允许我立即发出信号。