我正在使用python多处理库来执行selenium脚本。我的代码如下:
#-- start and join multiple threads ---
thread_list = []
total_threads=10 #-- no of parallel threads
for i in range(total_threads):
t = Process(target=get_browser_and_start, args=[url,nlp,pixel])
thread_list.append(t)
print "starting thread..."
t.start()
for t in thread_list:
print "joining existing thread..."
t.join()
据我了解join()
函数,它将等待每个进程完成。但我希望只要一个进程发布,它就会被分配另一个任务来执行新功能。
可以这样理解:
假设第一个实例启动了8个进程。
no_of_tasks_to_perform = 100
for i in range(no_of_tasks_to_perform):
processes start(8)
if process no 2 finished executing, start new process
maintain 8 process at any point of time till
"i" is <= no_of_tasks_to_perform
答案 0 :(得分:2)
不是偶尔启动新进程,而是尝试将所有任务放入multiprocessing.Queue()
,并启动8个长时间运行的进程,在每个进程中继续访问任务队列获得新任务然后完成工作,直到不再有任务为止。
在你的情况下,它更像是这样:
from multiprocessing import Queue, Process
def worker(queue):
while not queue.empty():
task = queue.get()
# now start to work on your task
get_browser_and_start(url,nlp,pixel) # url, nlp, pixel can be unpacked from task
def main():
queue = Queue()
# Now put tasks into queue
no_of_tasks_to_perform = 100
for i in range(no_of_tasks_to_perform):
queue.put([url, nlp, pixel, ...])
# Now start all processes
process = Process(target=worker, args=(queue, ))
process.start()
...
process.join()