我正在研究电信网络发现脚本,该脚本由Linux上的crontab运行。它使用初始网络节点的种子文件,将其连接到它们,获取所有邻居,然后再连接到那些邻居,依此类推。典型的递归。 为了加速整个过程,我使用带信号量的多线程,所以我只有一定数量的正在运行的线程,但是有大量的启动线程在等待。在某些时候,我遇到了Linux的最大线程限制,因此该脚本无法启动新线程。
在追求设计时,将允许对该递归进行多线程处理,在我看来,这似乎是多混合生产者/消费者方案的情况。多个消费者也在生产。
消费者从队列中取出项目,进行消费,如果有结果,则将每个结果再次返回到队列中。
要使其变得非常好,我想创建一个设计模式,该模式可用于任何类型的递归函数,换句话说,可用于任何args和kwargs。
我对这种函数的期望是,我将它所需的变量(args,kwargs)的任意组合传递给它,并获得了参数返回列表,可以在其他递归中再次传递给它。
除了我用过的方法以外,还有其他更好的方法来处理从函数返回的args,kwargs吗?我基本上创建了一个元组(args,kwargs)(tuple(),dict()),该函式返回,然后Worker将其拆分为args,kwargs。理想的情况是根本不需要创建该元组。
您对此设计还有其他改进建议吗?
真诚的感谢您!
#!/usr/bin/env python3
from queue import Queue, Empty
from threading import Thread
from time import sleep
from random import choice, random
class RecursiveWorkerThread(Thread):
def __init__(self, name, pool):
Thread.__init__(self)
self.name = name
self.pool = pool
self.tasks = pool.tasks
self.POISON = pool.POISON
self.daemon = False
self.result = None
self.start()
def run(self):
print(f'WORKER {self.name} - is awake.')
while True:
if not self.tasks.empty():
# take task from queue
try:
func, f_args, f_kwargs = self.tasks.get(timeout=1)
# check for POISON
if func is self.POISON:
print(f'WORKER {self.name} - POISON found. Sending it back to queue. Dying...')
self.pool.add_task(self.POISON)
break
# try to perform the task on arguments and get result
try:
self.result = func(*f_args, **f_kwargs)
except Exception as e:
print(e)
# recursive part, add results to queue
print(f'WORKER {self.name} - FUNC: ({func.__name__}) IN: (args: {f_args}, kwargs: {f_kwargs}) OUT: ({self.result}).')
for n_args, n_kwargs in self.result:
self.pool.add_task(func, *n_args, **n_kwargs)
# mark one task done in queue
self.tasks.task_done()
except Empty:
pass
sleep(random())
class RecursiveThreadPool:
def __init__(self, num_threads):
self.tasks = Queue()
self.POISON = object()
print('\nTHREAD_POOL - initialized.\nTHREAD_POOL - waking up WORKERS.')
self.workers = [RecursiveWorkerThread(name=str(num), pool=self) for num in range(num_threads)]
def add_task(self, func, *args, **kwargs):
if func is not self.POISON:
print(f'THREAD_POOL - task received: [func: ({func.__name__}), args: ({args}), kwargs:({kwargs})]')
else:
print('THREAD_POOL - task received: POISON.')
self.tasks.put((func, args, kwargs))
def wait_for_completion(self):
print('\nTHREAD_POOL - waiting for all tasks to be completed.')
self.tasks.join()
print('\nTHREAD_POOL - all tasks have been completed.\nTHREAD_POOL - sending POISON to queue.')
self.add_task(self.POISON)
print('THREAD_POOL - waiting for WORKERS to die.')
for worker in self.workers:
worker.join()
print('\nTHREAD_POOL - all WORKERS are dead.\nTHREAD_POOL - FINISHED.')
# Test part
if __name__ == '__main__':
percentage = [True] * 2 + [False] * 8
# example function
def get_subnodes(node):
maximum_subnodes = 2
sleep(5 * random())
result_list = list()
for i in range(maximum_subnodes):
# apply chance on every possible subnode
if choice(percentage):
new_node = node + '.' + str(i)
# create single result
args = tuple()
kwargs = dict({'node': new_node})
# append it to the result list
result_list.append((args, kwargs))
return result_list
# 1) Init a Thread pool with the desired number of worker threads
THREAD_POOL = RecursiveThreadPool(10)
# 2) Put initial data into queue
initial_nodes = 10
for root_node in [str(i) for i in range(initial_nodes)]:
THREAD_POOL.add_task(get_subnodes, node=root_node)
# 3) Wait for completion
THREAD_POOL.wait_for_completion()