Question

我正在编写一个API，允许用户一次单独或多个地在几百个数据库中运行查询。查询多个数据库时，将并行访问它们以加快响应时间。我的语义：＆＃34; q-job＆＃34;是在一个或多个数据库中执行的查询。

由于它是一个内部工具，预计用户数量不会很大（目前预计每天约有5000个查询），但有些限制对我来说似乎合情合理：

允许同时在数据库中运行X量的查询
只允许同时运行Y个q作业

到目前为止，我可以使用multiprocessing.Pool来并行查询数据库，但我仍然在排队q-jobs并使用{{1}限制单个数据库中的查询和池（也考虑了multiprocessing.Queue）。一些伪代码来说明我的想法：

multiprocessing.Semaphore

当我运行这种类型的代码并在数据库中执行查询时，该过程似乎永远不会超出标有＆＃34的行;注意：＆＃34;。没有错误消息; # Prepare pool and queues for multiprocessed tasks process_pool = multiprocessing.Pool(processes = max_processes) job_queue = multiprocessing.Queue() # Initialize the queue containing single- and multi-client queries results_queue = multiprocessing.Queue() # Initialize the queue that holds results # Query a database __execute_query_in_db(args): # [Open DB connection] # [Execute query] # [Close DB connection] # [Return response] # Main worker def __main_worker(w_queue): ''' Runs in the background and checks for new entries in the job queue ''' while True: # Endlessly check for new items in the queue ... conn_strings=w_queue.get(True) # ... __query_worker(conn_strings) # Run the query # Query worker def __query_worker(conn_strings): queries = process_pool.map_async(__execute_query_in_db, conn_strings) # Run individual DB queries in the pool try: retArr=queries.get(timeout=1200) # Execute queries in pool; Timeout after 20 minutes # NOTE: We never reach this point!!! except: logger.error("TS-API Unexpected error: ", sys.exc_info()[0]) process_pool.terminate() results = "Error: %s"%(sys.exc_info()[0]) else: # Wait until all queries have finished process_pool.close() process_pool.join() logger.debug("Got %s results"%len(retArr)) ## Ignore empty responses results = [] for result in retArr: if result is not None: results.extend(result) logger.debug("Putting results in results_queue") results_queue.put(results) # Instantiate single-process job pool that monitors new query requests job_pool = multiprocessing.Pool(processes=1, initializer=__main_worker, initargs=(job_queue,)) # Invoked when an API call is made... def ExecuteSuperQuery(params=None): # [Procure connection string(s)] job_queue.put(conn_strings) # Send connection information to the job queue while True: # Wait for response to be pushed to results queue if not results_queue.empty(): # ... return results_queue.get() # ... time.sleep(2) # ... # [Somehow get the results and show them as a (decorated) JSON string]似乎永远不会完成。

我还尝试运行一个进程，根据需要实例化一个进程池，每次查询一个或多个数据库，但后来我得到一个关于子进程无法运行子进程的错误。去图。

我向社区提供以下帮助：

是否有更好的方法来排队q-jobs并跟踪结果？
是否有更好的方法对查询和作业进行速率限制？

Python：使用多处理池和队列查询多个数据库

0 个答案: