如何与aiohttp一起使用aiopg

时间:2018-08-08 13:44:14

标签: python python-asyncio aiohttp aiopg

我有一个应用程序,它可以从Postgres表中循环访问URL:s,下载URL,在每次下载时运行处理功能,然后将处理结果保存回表中。

我已经使用aiopg和aiohttp编写了它以使其异步运行。 简化形式如下:

import asyncio
import aiopg
from aiohttp import ClientSession, TCPConnector

BATCH_SIZE = 100
dsn = "dbname=events user={} password={} host={}".format(DB_USER, DB_PASSWORD, DB_HOST)    

async def run():
    async with ClientSession(connector=TCPConnector(ssl=False, limit=100)) as session:
        async with aiopg.create_pool(dsn) as pool:
            while True:
                count = await run_batch(session, pool)
                if count == 0:
                    break

async def run_batch(session, db_pool):
    tasks = []
    async for url in get_batch(db_pool):
        task = asyncio.ensure_future(process_url(url, session, db_pool))
        tasks.append(task)
    await asyncio.gather(*tasks)

async def get_batch(db_pool):
    sql = "SELECT id, url FROM db.urls ... LIMIT %s"
    async with db_pool.acquire() as conn:
        async with conn.cursor() as cur:
            await cur.execute(sql, (BATCH_SIZE,))
            for row in cur:
                yield row

async def process_url(url, session, db_pool):
    async with session.get(url, timeout=15) as response:
        body = await response.read()
        data = process_body(body)
        await save_data(db_pool, data)

async def process_body(body):
    ...
    return data

async def save_data(db_pool, data):
    sql = "UPDATE db.urls ..."
    async with db_pool.acquire() as conn:
        async with conn.cursor() as cur:
            await cur.execute(sql, (data,))

但是出了点问题。该脚本运行的时间越慢,运行的速度就越慢,并且从调用session.get引发的异常也越来越多。我的猜测是我使用Postgres连接的方式有问题,但我无法弄清楚出了什么问题。任何帮助将不胜感激!

0 个答案:

没有答案