python 3 asyncio和MotorClient:如何使用带多线程和多事件循环的电机

时间:2016-11-21 20:04:37

标签: python multithreading python-asyncio tornado-motor pymongo-3.x

我回答有关asyncio的问题。我发现它非常有用(特别是由于带有线程的GIL),我试图提高一些代码的性能。

我的申请正在执行以下操作:

  • 1后台守护程序线程“A”从连接的客户端接收事件,并通过填充SetQueue(这只是一个删除重复ID的事件队列)并在数据库中执行一些插入来做出反应。我从另一个模块中获取此守护进程(基本上我控制从收到事件时的回调)。在下面的示例代码中,我将其替换为我生成的一个线程,并且只需在20个项目中填充队列并在退出之前模拟数据库插入。
  • 1启动后台守护程序线程“B”(loop_start),他只是循环运行直到完成一个协程:

    • 获取队列中的所有项目(如果不是空的,否则它会释放控制x秒,然后重新启动协程)
    • 对于队列中的每个id,它会启动一个链式协程:

      • 创建并等待从DB获取该ID的所有相关信息的任务。我正在使用支持asyncio的MotorClient来等待任务本身。

      • 使用进程池执行程序为每个id启动一个进程,该进程使用数据库数据进行一些CPU密集型处理。

  • 主线程只是初始化db_client并接受loop_start和stop命令。

基本上就是这样。

现在我正试图尽可能提高性能。

我目前的问题是以这种方式使用motor.motor_asyncio.AsyncioMotorClient()

  1. 它在主线程中初始化,我想创建索引
  2. 线程“A”需要执行数据库插入
  3. 线程“B”需要执行DB查找/读取
  4. 我该怎么做? Motor声明它适用于单个线程应用程序,您明显使用单个事件循环。 在这里,我发现自己被迫有两个事件循环,一个在线程“A”,一个在线程“B”。这不是最优的,但我没有设法使用call_soon_threadsafe的单个事件循环,同时保持相同的行为...我认为性能明智我仍然获得了很多与两个事件循环释放控制gil绑定cpu核心

    我应该使用三个不同的AsyncioMotorClient实例(每个线程一个)并按上述方法使用它们吗?我在尝试时遇到了不同的错误。

    这是我的示例代码,它不包含Asynchro的__init__

    中的MotorClient初始化
    import threading
    import asyncio
    import concurrent.futures
    import functools
    import os
    import time
    import logging
    from random import randint
    from queue import Queue
    
    
    
    
    
    # create logger
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.DEBUG)
    # create file handler which logs even debug messages
    fh = logging.FileHandler('{}.log'.format(__name__))
    fh.setLevel(logging.DEBUG)
    # create console handler with a higher log level
    ch = logging.StreamHandler()
    ch.setLevel(logging.DEBUG)
    # create formatter and add it to the handlers
    formatter = logging.Formatter('%(asctime)s - %(name)s - %(processName)s - %(threadName)s - %(levelname)s - %(message)s')
    fh.setFormatter(formatter)
    ch.setFormatter(formatter)
    # add the handlers to the logger
    logger.addHandler(fh)
    logger.addHandler(ch)
    
    
    class SetQueue(Queue):
        """Queue that avoids duplicate entries while keeping an order."""
        def _init(self, maxsize):
            self.maxsize = maxsize
            self.queue = set()
    
        def _put(self, item):
            if type(item) is not int:
                raise TypeError
            self.queue.add(item)
    
        def _get(self):
            # Get always all items in a thread-safe manner
            ret = self.queue.copy()
            self.queue.clear()
            return ret
    
    
    class Asynchro:
        def __init__(self, event_queue):
            self.__daemon = None
            self.__daemon_terminate = False
            self.__queue = event_queue
    
        def fake_populate(self, size):
            t = threading.Thread(target=self.worker, args=(size,))
            t.daemon = True
            t.start()
    
        def worker(self, size):
            run = True
            populate_event_loop = asyncio.new_event_loop()
            asyncio.set_event_loop(populate_event_loop)
            cors = [self.worker_cor(i, populate_event_loop) for i in range(size)]
            done, pending = populate_event_loop.run_until_complete(asyncio.wait(cors))
            logger.debug('Finished to populate event queue with result done={}, pending={}.'.format(done, pending))
            while run:
                # Keep it alive to simulate something still alive (minor traffic)
                time.sleep(5)
                rand = randint(100, 200)
                populate_event_loop.run_until_complete(self.worker_cor(rand, populate_event_loop))
                if self.__daemon_terminate:
                    logger.debug('Closed the populate_event_loop.')
                    populate_event_loop.close()
                    run = False
    
        async def worker_cor(self, i, loop):
            time.sleep(0.5)
            self.__queue.put(i)
            logger.debug('Wrote {} in the event queue that has now size {}.'.format(i, self.__queue.qsize()))
            # Launch fake DB Insertions
            #db_task = loop.create_task(self.fake_db_insert(i))
            db_data = await self.fake_db_insert(i)
            logger.info('Finished to populate with id {}'.format(i))
            return db_data
    
        @staticmethod
        async def fake_db_insert(item):
            # Fake some DB insert
            logger.debug('Starting fake db insertion with id {}'.format(item))
            st = randint(1, 101) / 100
            await asyncio.sleep(st)
            logger.debug('Finished db insertion with id {}, sleep {}'.format(item, st))
            return item
    
        def loop_start(self):
            logger.info('Starting the loop.')
            if self.__daemon is not None:
                raise Exception
            self.__daemon_terminate = False
            self.__daemon = threading.Thread(target=self.__daemon_main)
            self.__daemon.daemon = True
            self.__daemon.start()
    
        def loop_stop(self):
            logger.info('Stopping the loop.')
            if self.__daemon is None:
                raise Exception
            self.__daemon_terminate = True
            if threading.current_thread() != self.__daemon:
                self.__daemon.join()
                self.__daemon = None
                logger.debug('Stopped the loop and closed the event_loop.')
    
        def __daemon_main(self):
            logger.info('Background daemon started (inside __daemon_main).')
            event_loop = asyncio.new_event_loop()
            asyncio.set_event_loop(event_loop)
            run, rc = True, 0
            while run:
                logger.info('Inside \"while run\".')
                event_loop.run_until_complete(self.__cor_main())
                if self.__daemon_terminate:
                    event_loop.close()
                    run = False
                    rc = 1
            return rc
    
        async def __cor_main(self):
            # If nothing in the queue release control for a bit
            if self.__queue.qsize() == 0:
                logger.info('Event queue is empty, going to sleep (inside __cor_main).')
                await asyncio.sleep(10)
                return
            # Extract all items from event queue
            items = self.__queue.get()
            # Run asynchronously DB extraction and processing on the ids (using pool of processes)
            with concurrent.futures.ProcessPoolExecutor(max_workers=8) as executor:
                cors = [self.__cor_process(item, executor) for item in items]
                logger.debug('Launching {} coroutines to elaborate queue items (inside __cor_main).'.format(len(items)))
                done, pending = await asyncio.wait(cors)
                logger.debug('Finished to execute __cor_main with result {}, pending {}'
                             .format([t.result() for t in done], pending))
    
        async def __cor_process(self, item, executor):
            # Extract corresponding DB data
            event_loop = asyncio.get_event_loop()
            db_task = event_loop.create_task(self.fake_db_access(item))
            db_data = await db_task
            # Heavy processing of data done in different processes
            logger.debug('Launching processes to elaborate db_data.')
            res = await event_loop.run_in_executor(executor, functools.partial(self.fake_processing, db_data, None))
            return res
    
        @staticmethod
        async def fake_db_access(item):
            # Fake some db access
            logger.debug('Starting fake db access with id {}'.format(item))
            st = randint(1, 301) / 100
            await asyncio.sleep(st)
            logger.debug('Finished db access with id {}, sleep {}'.format(item, st))
            return item
    
        @staticmethod
        def fake_processing(db_data, _):
            # fake some CPU processing
            logger.debug('Starting fake processing with data {}'.format(db_data))
            st = randint(1, 101) / 10
            time.sleep(st)
            logger.debug('Finished fake processing with data {}, sleep {}, process id {}'.format(db_data, st, os.getpid()))
            return db_data
    
    
    def main():
        # Event queue
        queue = SetQueue()
        return Asynchro(event_queue=queue)
    
    
    if __name__ == '__main__':
        a = main()
        a.fake_populate(20)
        time.sleep(5)
        a.loop_start()
        time.sleep(20)
        a.loop_stop()
    

1 个答案:

答案 0 :(得分:1)

运行多个事件循环的原因是什么?

我建议在主线程中使用单循环,它是asyncio的原生模式。

在非常罕见的情况下,

asyncio 可能在非主线程中运行循环,但它看起来不像你的情况。