Python线程:使主线程报告进度

时间:2018-11-28 13:02:16

标签: python multithreading python-multithreading

我并行运行一些作业,这可能会花费很长时间,因此我希望主线程报告进度。例如,每个小时。

下面是我想出的简化版本。该代码将使用来自test_function的参数在2个线程中运行input_arguments。每5秒钟将打印%的已完成作业。

import threading
import queue
import time


def test_function(x):
    time.sleep(4)
    print("Finished ", x)


num_processes = 2
input_arguments = range(10)

# Define a worker which will continuously execute function taking input parameters from the queue
def worker():
    while True:
        x = q.get()
        if x is None:
            break
        test_function(x)
        q.task_done()

# Initialize queue and the threads
q = queue.Queue()
threads = []
for i in range(num_processes):
    t = threading.Thread(target=worker)
    t.start()
    threads.append(t)

# Create a queue of input parameters for function
for item in input_arguments:
    q.put(item)

# Report progress every 5 seconds
report_progress(q)

# stop workers
for i in range(num_processes):
    q.put(None)
for t in threads:
    t.join()

report_progress的定义如下

def report_progress(q):
    qsize_init = q.qsize()
    while not q.empty():
        time.sleep(5)
        portion_finished = 1 - q.qsize() / qsize_init
        print("run_parallel: {:.1%} jobs are finished".format(portion_finished))

但是,我想每小时而不是5秒报告一次进度,并且如果所有作业都完成了,则该程序可能只是空闲了几分钟。

另一种可能性是用不同的方式定义report_progress

def report_progress(q):
    qsize_init = q.qsize()
    time_start = time.time()
    while not q.empty():
        current_time = time.time()
        if current_time - time_start > 5:
            portion_finished = 1 - q.qsize() / qsize_init
            print("run_parallel: {:.1%} jobs are finished".format(portion_finished))
            time_start = time.time()

我担心不断检查这种情况会浪费CPU资源,但会占用很多时间。

是否有标准的处理方式?

Python:3.6

1 个答案:

答案 0 :(得分:0)

现在,我将使用一个简单的解决方案,该解决方案由@Andriy Maletsky的评论建议。

如果q还不为空,主线程将每隔几秒钟检查一次,如果距上次报告已超过1个小时,它将打印进度消息。

time_between_reports = 3600
time_between_checks = 5
def report_progress_until_finished(q):
    qsize_init = q.qsize()
    last_report_time = time.time()
    while not q.empty():
        time_elapsed = time.time() - last_report_time
        if time_elapsed > time_between_reports:
            portion_finished = 1 - q.qsize() / qsize_init
            print("run_parallel: {:.1%} jobs are finished".format(portion_finished))
            last_report_time = time.time()
        time.sleep(time_between_checks)