多处理:使用tqdm显示进度条

时间:2017-01-29 10:58:46

标签: python multiprocessing progress-bar tqdm

为了使我的代码更“pythonic”更快,我使用“multiprocessing”和map函数发送它a)函数和b)迭代范围。

植入的解决方案(即直接在范围tqdm.tqdm(范围(0,30))上调用tqdm不适用于多处理(如下面的代码所示)。

进度条从0到100%显示(当python读取代码时?)但它并不表示地图功能的实际进度。

如何显示一个进度条,指示“地图”功能在哪一步?

from multiprocessing import Pool
import tqdm
import time

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
   p = Pool(2)
   r = p.map(_foo, tqdm.tqdm(range(0, 30)))
   p.close()
   p.join()

欢迎任何帮助或建议......

8 个答案:

答案 0 :(得分:63)

使用imap而不是map,它返回已处理值的迭代器。

from multiprocessing import Pool
import tqdm
import time

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
   with Pool(2) as p:
      r = list(tqdm.tqdm(p.imap(_foo, range(30)), total=30))

答案 1 :(得分:31)

发现解决方案:小心!由于多处理,估计时间(每循环迭代,总时间等)可能不稳定,但进度条工作正常。

注意:Pool的上下文管理器仅适用于Python 3.3版

from multiprocessing import Pool
import time
from tqdm import *

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
    with Pool(processes=2) as p:
        max_ = 30
        with tqdm(total=max_) as pbar:
            for i, _ in tqdm(enumerate(p.imap_unordered(_foo, range(0, max_)))):
                pbar.update()

答案 2 :(得分:30)

对不起,但是如果您需要的是并发映射,我在tqdm>=4.42.0中添加了此功能:

from tqdm.contrib.concurrent import process_map  # or thread_map
import time

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
   r = process_map(_foo, range(0, 30), max_workers=2)

参考:https://tqdm.github.io/docs/contrib.concurrent/https://github.com/tqdm/tqdm/blob/master/examples/parallel_bars.py

它支持max_workerschunksize,并且您也可以轻松地从process_map切换到thread_map

答案 3 :(得分:1)

您可以改用p_tqdm

https://github.com/swansonk14/p_tqdm

from p_tqdm import p_map
import time

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
   r = p_map(_foo, list(range(0, 30)))

答案 4 :(得分:1)

当需要从并行执行函数中获取结果时,这就是我的看法。这个函数可以做一些事情(我的另一篇文章对此做了进一步的解释),但是关键是有一个任务待处理队列和一个任务完成队列。当工作人员完成挂起队列中的每个任务时,他们会将结果添加到任务完成队列中。您可以使用tqdm进度条将检查包装到任务完成队列中。我没有在这里放置do_work()函数的实现,这无关紧要,因为这里的消息是监视已完成任务的队列并在每次输入结果时更新进度条。

def par_proc(job_list, num_cpus=None, verbose=False):

# Get the number of cores
if not num_cpus:
    num_cpus = psutil.cpu_count(logical=False)

print('* Parallel processing')
print('* Running on {} cores'.format(num_cpus))

# Set-up the queues for sending and receiving data to/from the workers
tasks_pending = mp.Queue()
tasks_completed = mp.Queue()

# Gather processes and results here
processes = []
results = []

# Count tasks
num_tasks = 0

# Add the tasks to the queue
for job in job_list:
    for task in job['tasks']:
        expanded_job = {}
        num_tasks = num_tasks + 1
        expanded_job.update({'func': pickle.dumps(job['func'])})
        expanded_job.update({'task': task})
        tasks_pending.put(expanded_job)

# Set the number of workers here
num_workers = min(num_cpus, num_tasks)

# We need as many sentinels as there are worker processes so that ALL processes exit when there is no more
# work left to be done.
for c in range(num_workers):
    tasks_pending.put(SENTINEL)

print('* Number of tasks: {}'.format(num_tasks))

# Set-up and start the workers
for c in range(num_workers):
    p = mp.Process(target=do_work, args=(tasks_pending, tasks_completed, verbose))
    p.name = 'worker' + str(c)
    processes.append(p)
    p.start()

# Gather the results
completed_tasks_counter = 0

with tqdm(total=num_tasks) as bar:
    while completed_tasks_counter < num_tasks:
        results.append(tasks_completed.get())
        completed_tasks_counter = completed_tasks_counter + 1
        bar.update(completed_tasks_counter)

for p in processes:
    p.join()

return results

答案 5 :(得分:0)

这种方法简单有效。

from multiprocessing.pool import ThreadPool
import time
from tqdm import tqdm

def job():
    time.sleep(1)
    pbar.update()

pool = ThreadPool(5)
with tqdm(total=100) as pbar:
    for i in range(100):
        pool.apply_async(job)
    pool.close()
    pool.join()

答案 6 :(得分:0)

import multiprocessing as mp
import tqdm


some_iterable = ...

def some_func():
    # your logic
    ...


if __name__ == '__main__':
    with mp.Pool(mp.cpu_count()-2) as p:
        list(tqdm.tqdm(p.imap(some_func, iterable), total=len(iterable)))

答案 7 :(得分:0)

对于带有apply_async的进度条,我们可以使用如下建议的代码:

https://github.com/tqdm/tqdm/issues/484

import time
import random
from multiprocessing import Pool
from tqdm import tqdm

def myfunc(a):
    time.sleep(random.random())
    return a ** 2

pool = Pool(2)
pbar = tqdm(total=100)

def update(*a):
    pbar.update()

for i in range(pbar.total):
    pool.apply_async(myfunc, args=(i,), callback=update)
pool.close()
pool.join()