SQLAlchemy ThreadPoolExecutor“客户太多”

时间:2016-09-14 15:01:36

标签: python multithreading postgresql sqlalchemy python-asyncio

我用这种逻辑编写了一个脚本,以便在生成PostgreSQL表时将许多记录插入。

#!/usr/bin/env python3
import asyncio
from concurrent.futures import ProcessPoolExecutor as pool
from functools import partial

import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base


metadata = sa.MetaData(schema='stackoverflow')
Base = declarative_base(metadata=metadata)


class Example(Base):
    __tablename__ = 'example'
    pk = sa.Column(sa.Integer, primary_key=True)
    text = sa.Column(sa.Text)


sa.event.listen(Base.metadata, 'before_create',
    sa.DDL('CREATE SCHEMA IF NOT EXISTS stackoverflow'))

engine = sa.create_engine(
    'postgresql+psycopg2://postgres:password@localhost:5432/stackoverflow'
)
Base.metadata.create_all(engine)
session = sa.orm.sessionmaker(bind=engine, autocommit=True)()


def task(value):
    engine.dispose()
    with session.begin():
        session.add(Example(text=value))


async def infinite_task(loop):
    spawn_task = partial(loop.run_in_executor, None, task)
    while True:
        await asyncio.wait([spawn_task(value) for value in range(10000)])


def main():
    loop = asyncio.get_event_loop()
    with pool() as executor:
        loop.set_default_executor(executor)
        asyncio.ensure_future(infinite_task(loop))
        loop.run_forever()
        loop.close()


if __name__ == '__main__':
    main()

这段代码运行得很好,创建了一个包含多少进程的池,就像我拥有CPU内核一样,并且乐意永远地欢呼。我想看看线程如何与进程进行比较,但我无法得到一个有效的例子。以下是我所做的更改:

from concurrent.futures import ThreadPoolExecutor as pool

session_maker = sa.orm.sessionmaker(bind=engine, autocommit=True)
Session = sa.orm.scoped_session(session_maker)


def task(value):
    engine.dispose()
    # create new session per thread
    session = Session()
    with session.begin():
        session.add(Example(text=value))
    # remove session once the work is done
    Session.remove()

此版本在大量“太多客户”例外之前运行了一段时间:

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL:  sorry, too many clients already

我错过了什么?

2 个答案:

答案 0 :(得分:2)

事实证明,问题是engine.dispose(),用迈克拜耳(zzzeek)和#34;的话来说,PG连接处于打开状态以便进行垃圾收集。"

来源:https://groups.google.com/forum/#!topic/sqlalchemy/zhjCBNebnDY

因此更新的task函数如下所示:

def task(value):
    # create new session per thread
    session = Session()
    with session.begin():
        session.add(Example(text=value))
    # remove session object once the work is done
    session.remove()

答案 1 :(得分:0)

看起来你在没有关闭它们的情况下打开了很多新连接,尝试在之后添加engine.dispose():

from concurrent.futures import ThreadPoolExecutor as pool

session_maker = sa.orm.sessionmaker(bind=engine, autocommit=True)
Session = sa.orm.scoped_session(session_maker)


def task(value):
    engine.dispose()
    # create new session per thread
    session = Session()
    with session.begin():
        session.add(Example(text=value))
    # remove session once the work is done
    Session.remove()
    engine.dispose()

请记住新连接的成本,理想情况下,每个进程/线程应该有一个连接,但我不确定ThreadPoolExecutor是如何工作的,并且可能在线程的执行完成时没有关闭连接。