尝试利用multiprocessing.Queue和threading.Thread来拆分大量任务(监视摄像机的运行状况)。鉴于下面的代码,我想知道何时检查了所有摄像机(有32,000多个),但是我的输出似乎从未到达main中的打印语句。
每个queue_worker都调用“ process_camera”,该进程当前执行所有运行状况检查并返回一个值(此部分有效!)。
当我看着它运行时,我发现它变得有点“完整”和“挂起”,因此有些东西被阻塞或导致它无法完成...我尝试了get()和join( )带有超时参数的语句,但这似乎根本没有效果!
我已经盯着这段代码和文档了3天了……有什么明显的我没看到吗?
最终目标是先检查所有30,000个摄像机(在脚本启动时加载到all_cameras中),然后进行“循环”并继续执行直到用户中止脚本为止。
def queue_worker(camera_q, result_q):
'''
Function takes camera off the queue and calls healthchecks
'''
try:
camera = camera_q.get()
camera_status, remove_camera = process_camera(camera)
result_q.put("Success")
return True
except queue.Empty:
logging.info("Queue is empty")
result_q.put("Fail")
return False
def process_worker(camera_q, result_q, process_num, stop_event):
while not stop_event.is_set():
# Create configured number of threads and provide references to both Queues to each thread
threads = []
for i in range(REQUEST_THREADS):
thread = threading.Thread(target=queue_worker, args=(camera_q, result_q))
thread.setName("CameraThread-{}".format(i))
threads.append(thread)
thread.start()
for thread in threads:
thread.join(timeout=120)
if camera_q.empty():
num_active = sum([t.is_alive() for t in threads])
logging.info("[Process {}] << {} >> active threads and << {} >> cameras left to process. << {} >> processed.".format(process_num, num_active, camera_q.qsize(), result_q.qsize()))
def main():
'''
Main application entry
'''
logging.info("Starting Scan With << " + str(REQUEST_THREADS) + " Threads and " + str(CHILD_PROCESSES) + " Processors >>")
logging.info("Reference Images Stored During Scan << " + str(store_images) + " >>")
stop_event = multiprocessing.Event()
camera_q, result_q = multiprocessing.Queue(), multiprocessing.Queue()
# Create a Status thread for maintaining process status
create_status_thread()
all_cameras = get_oversite_cameras(True)
for camera in all_cameras:
camera_q.put(camera)
logging.info("<< {} >> cameras queued up".format(camera_q.qsize()))
processes = []
process_num = 0
finished_processes = 0
for i in range(CHILD_PROCESSES):
process_num += 1
proc = multiprocessing.Process(target=process_worker, args=(camera_q, result_q, process_num, stop_event))
proc.start()
processes.append(proc)
for proc in processes:
proc.join()
finished_processes += 1
logging.info("{} finished processes".format(finished_pr))
logging.info("All processes finished")
编辑:不确定是否有帮助(可视),但这是使用2000个摄像机的摄像机进行测试时当前输出的示例:
[2018-11-01 23:47:41,854] INFO - MainThread - root - Starting Scan With << 100 Threads and 16 Processors >>
[2018-11-01 23:47:41,854] INFO - MainThread - root - Reference Images Stored During Scan << False >>
[2018-11-01 23:47:41,977] INFO - MainThread - root - << 2000 >> cameras queued up
[2018-11-01 23:47:54,865] INFO - MainThread - root - [Process 3] << 0 >> active threads and << 0 >> cameras left to process. << 1570 >> processed.
[2018-11-01 23:47:56,009] INFO - MainThread - root - [Process 11] << 0 >> active threads and << 0 >> cameras left to process. << 1575 >> processed.
[2018-11-01 23:47:56,210] INFO - MainThread - root - [Process 14] << 0 >> active threads and << 0 >> cameras left to process. << 1579 >> processed.
[2018-11-01 23:47:56,345] INFO - MainThread - root - [Process 9] << 0 >> active threads and << 0 >> cameras left to process. << 1580 >> processed.
[2018-11-01 23:47:59,118] INFO - MainThread - root - [Process 2] << 0 >> active threads and << 0 >> cameras left to process. << 1931 >> processed.
[2018-11-01 23:47:59,637] INFO - MainThread - root - [Process 15] << 0 >> active threads and << 0 >> cameras left to process. << 1942 >> processed.
[2018-11-01 23:48:00,310] INFO - MainThread - root - [Process 8] << 0 >> active threads and << 0 >> cameras left to process. << 1945 >> processed.
[2018-11-01 23:48:00,445] INFO - MainThread - root - [Process 13] << 0 >> active threads and << 0 >> cameras left to process. << 1946 >> processed.
[2018-11-01 23:48:01,391] INFO - MainThread - root - [Process 10] << 0 >> active threads and << 0 >> cameras left to process. << 1949 >> processed.
[2018-11-01 23:48:01,527] INFO - MainThread - root - [Process 5] << 0 >> active threads and << 0 >> cameras left to process. << 1950 >> processed.
[2018-11-01 23:48:01,655] INFO - MainThread - root - [Process 6] << 0 >> active threads and << 0 >> cameras left to process. << 1951 >> processed.
[2018-11-01 23:48:02,519] INFO - MainThread - root - [Process 1] << 0 >> active threads and << 0 >> cameras left to process. << 1954 >> processed.
[2018-11-01 23:48:06,915] INFO - MainThread - root - [Process 12] << 0 >> active threads and << 0 >> cameras left to process. << 1981 >> processed.
[2018-11-01 23:48:27,339] INFO - MainThread - root - [Process 16] << 0 >> active threads and << 0 >> cameras left to process. << 1988 >> processed.
[2018-11-01 23:48:28,762] INFO - MainThread - root - [Process 4] << 0 >> active threads and << 0 >> cameras left to process. << 1989 >> processed.
它于1989年“吊死”,距2000年只有一年-这太难调试了!
答案 0 :(得分:0)
确切地回答这个问题有点困难,因为它不是完整的清单。例如,create_status_thread()的实现被隐藏。这对于解决死锁尤其棘手,因为死锁通常是由对共享资源的特定访问序列引起的,并且create_status_thread可能具有其中之一。不过,有些建议:
请记住,将项目放入队列的进程将在终止之前等待,直到所有缓冲的项目由“ feeder”线程馈送到基础管道为止。 (子进程可以调用队列的Queue.cancel_join_thread方法来避免这种行为。)
这意味着,每当您使用队列时,都需要确保在加入该进程之前,将最终删除队列中已放置的所有项目。否则,您无法确定将项目放入队列的进程将终止。还请记住,非守护进程将自动加入。
上面的链接包含一个阻止示例,其中在get()之前调用join()。似乎在您的代码中可能取决于执行顺序,就像您在process_worker()内部调用get()一样。