SQLAlchemy奇怪的线程行为

时间:2018-05-29 07:39:55

标签: python multithreading sqlalchemy

使用Python 3.6.5和SQLAlchemy 1.2.7

考虑以下示例代码
import threading
from concurrent.futures import ThreadPoolExecutor

from sqlalchemy import create_engine, Column, Integer, Boolean
from sqlalchemy.exc import OperationalError
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, scoped_session, Session

engine = create_engine("sqlite:///threading_sqlalchemy.db")
base = declarative_base(engine)
smaker = sessionmaker(engine)
scopedmaker: scoped_session = scoped_session(smaker)

dblock = threading.Lock()


class Key(base):
    __tablename__ = "Key"
    id = Column(Integer, primary_key=True)
    value = Column(Integer, nullable=False, unique=True, index=True)
    taken = Column(Boolean, nullable=False, default=False)

    def __repr__(self):
        return f"<Key id={self.id}, value={self.value}, taken={self.taken}>"


try:
    Key.__table__.drop()
    # this is also quite funny, if the table doesn't exist it throws:
    # sqlite3.OperationalError: no such table: Key
    # when there is literally a sqlalchemy.exc.NoSuchTableError
except OperationalError:
    pass
base.metadata.create_all()


def gen_keys(n):
    print(f"made in {threading.current_thread()}")
    with dblock:
        session: Session = scopedmaker()
        session.bulk_save_objects([Key(value=i * 100) for i in range(0, n)])
        session.commit()


def take_keys(n):
    print(f"used in {threading.current_thread()}")
    with dblock:
        session: Session = scopedmaker()
        keys = session.query(Key).filter(Key.taken == False).limit(n).all()
        for key in keys:
            key.taken = True
        print(keys)
        session.commit()


def take_keys_2(n):
    print(f"used in {threading.current_thread()}")
    with dblock:
        session: Session = scopedmaker()
        keys = session.query(Key).filter(Key.taken == False).limit(n).all()
        for key in keys:
            key.taken = True
        session.commit()
        print(keys)


gen_keys(100)

# take_keys works just as expected
with ThreadPoolExecutor() as executor:
    for _ in range(0, 5):
        executor.submit(take_keys, 10)

# take_keys_2 breaks, raises following error
# >>> sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.
# >>> The object was created in thread id 12340 and this is thread id 4312
# according to the console log, 12340 is one of the ThreadPoolExecutor threads, and 4312 is the main thread
with ThreadPoolExecutor() as executor:
    for _ in range(0, 5):
        executor.submit(take_keys_2, 10)

我只有一个非常简单的课程Key,它有一个值,可以标记为taken,想一下像赠品一样的东西,你不想将同一个分发给不同的潜在客户。我用它来测试确实存在的竞争条件并强迫我使用数据库访问锁,没什么大不了的,我可以忍受。

我真正不理解的是take_keys有效的原因,但take_keys_2在它们之间的唯一区别是print(keys)语句的位置时会中断。特别是因为在非功能性示例中,错误消息似乎是我在错误的线程中使用创建的对象(我不是,我只是在session.commit()之后在同一个线程中使用它创造了它。

如果有人能说清楚为什么会这样,我很高兴。

1 个答案:

答案 0 :(得分:1)

现在,我没有所有细节,但足以让你了解情况。 Threading support in SQLite并不是很好。因为如果使用内存数据库,SQLAlchemy的池行为默认为SingletonThreadPool,如果使用文件,则默认为NullPool。后者意味着根本没有汇集,换句话说,连接始终按要求打开和关闭。

print()的位置很重要,因为上面对session.commit()的调用会使会话中所有数据库加载的对象状态到期。因此,为了打印最终调用其__repr__的键列表,SQLAlchemy必须重新获取每个对象的状态。如果您在echo=True的通话中添加create_engine(),则会显而易见。

毕竟session take_keys_2与开放交易有关联。这是一个有点混乱的地方:当函数退出时,session超出范围,这意味着它所持有的连接最终返回到池中。但是池是NullPool,因此它最终确定并关闭连接并丢弃它。最终化意味着回滚任何未完成的交易,这就是失败的原因:

Traceback (most recent call last):
  File "~/Work/sqlalchemy/lib/sqlalchemy/pool.py", line 705, in _finalize_fairy
    fairy._reset(pool)
  File "~/Work/sqlalchemy/lib/sqlalchemy/pool.py", line 876, in _reset
    pool._dialect.do_rollback(self)
  File "~/Work/sqlalchemy/lib/sqlalchemy/engine/default.py", line 457, in do_rollback
    dbapi_connection.rollback()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.The object was created in thread id 140683561543424 and this is thread id 140683635291968

在解释器关闭期间,在“虚拟”线程中执行终结,而不是工作者,因为连接保持不变。

例如,如果您在session.rollback()之后添加了对print(keys)的电话:

def take_keys_2(n):
    ...
    with dblock:
        ...
        session.commit()
        print(keys)
        session.rollback()

the connection is returned to the pool explicitlytake_keys_2同样适用。另一种选择是使用expire_on_commit=False,以便在提交后不需要额外的查询来打印Key对象的表示:

def take_keys_2(n):
    with dblock:
        session: Session = scopedmaker(expire_on_commit=False)
        ...