我并行运行一些作业,这可能会花费很长时间,因此我希望主线程报告进度。例如,每个小时。
下面是我想出的简化版本。该代码将使用来自test_function
的参数在2个线程中运行input_arguments
。每5秒钟将打印%的已完成作业。
import threading
import queue
import time
def test_function(x):
time.sleep(4)
print("Finished ", x)
num_processes = 2
input_arguments = range(10)
# Define a worker which will continuously execute function taking input parameters from the queue
def worker():
while True:
x = q.get()
if x is None:
break
test_function(x)
q.task_done()
# Initialize queue and the threads
q = queue.Queue()
threads = []
for i in range(num_processes):
t = threading.Thread(target=worker)
t.start()
threads.append(t)
# Create a queue of input parameters for function
for item in input_arguments:
q.put(item)
# Report progress every 5 seconds
report_progress(q)
# stop workers
for i in range(num_processes):
q.put(None)
for t in threads:
t.join()
report_progress
的定义如下
def report_progress(q):
qsize_init = q.qsize()
while not q.empty():
time.sleep(5)
portion_finished = 1 - q.qsize() / qsize_init
print("run_parallel: {:.1%} jobs are finished".format(portion_finished))
但是,我想每小时而不是5秒报告一次进度,并且如果所有作业都完成了,则该程序可能只是空闲了几分钟。
另一种可能性是用不同的方式定义report_progress
:
def report_progress(q):
qsize_init = q.qsize()
time_start = time.time()
while not q.empty():
current_time = time.time()
if current_time - time_start > 5:
portion_finished = 1 - q.qsize() / qsize_init
print("run_parallel: {:.1%} jobs are finished".format(portion_finished))
time_start = time.time()
我担心不断检查这种情况会浪费CPU资源,但会占用很多时间。
是否有标准的处理方式?
Python:3.6
答案 0 :(得分:0)
现在,我将使用一个简单的解决方案,该解决方案由@Andriy Maletsky的评论建议。
如果q还不为空,主线程将每隔几秒钟检查一次,如果距上次报告已超过1个小时,它将打印进度消息。
time_between_reports = 3600
time_between_checks = 5
def report_progress_until_finished(q):
qsize_init = q.qsize()
last_report_time = time.time()
while not q.empty():
time_elapsed = time.time() - last_report_time
if time_elapsed > time_between_reports:
portion_finished = 1 - q.qsize() / qsize_init
print("run_parallel: {:.1%} jobs are finished".format(portion_finished))
last_report_time = time.time()
time.sleep(time_between_checks)