与辅助表连接行为的SQLAlchemy关系在延迟加载和急切加载之间发生变化

时间:2017-11-16 10:36:31

标签: python sql-server database sqlite sqlalchemy

我已经和SQL Alchemy玩了几个月了,到目前为止对它印象非常深刻。

我现在遇到的一个问题似乎是一个错误,但我不确定我做的是正确的事情。我们在这里使用MS SQL,使用表反射来定义表类,但是我可以使用内存中的SQLite数据库复制问题,这是我在这里包含的代码。

我正在做的是使用它们之间的链接表定义两个表之间的多对多关系。链接表中包含一条额外的信息,我想用它来过滤链接,需要在关系上使用primaryjoin语句。这非常适合延迟加载,但是出于性能原因,我们需要急切加载,这就是它所有的结果。

如果我使用延迟加载定义关系:

activefunds = relationship('Fund', secondary='fundbenchmarklink',
                           primaryjoin='and_(FundBenchmarkLink.isactive==True,'
                                       'Benchmark.id==FundBenchmarkLink.benchmarkid,'
                                       'Fund.id==FundBenchmarkLink.fundid)')

并正常查询数据库:

query = session.query(Benchmark)

我需要的行为正是我想要的,虽然性能非常糟糕,因为在迭代所有基准和各自的基金时会有额外的SQL查询。

如果我用渴望加载定义关系:

activefunds = relationship('Fund', secondary='fundbenchmarklink',
                           primaryjoin='and_(FundBenchmarkLink.isactive==True,'
                                       'Benchmark.id==FundBenchmarkLink.benchmarkid,'
                                       'Fund.id==FundBenchmarkLink.fundid)',
                           lazy='joined')

并正常查询数据库:

query = session.query(Benchmark)
它在我脸上爆炸:

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such column: fund.id 
    [SQL: 'SELECT benchmark.id AS benchmark_id,
                   benchmark.name AS benchmark_name,
                   fund_1.id AS fund_1_id,
                   fund_1.name AS fund_1_name,
                   fund_2.id AS fund_2_id,
                   fund_2.name AS fund_2_name 
            FROM benchmark
            LEFT OUTER JOIN (fundbenchmarklink AS fundbenchmarklink_1
                             JOIN fund AS fund_1 ON fund_1.id = fundbenchmarklink_1.fundid) ON benchmark.id = fundbenchmarklink_1.benchmarkid
            LEFT OUTER JOIN (fundbenchmarklink AS fundbenchmarklink_2
                             JOIN fund AS fund_2 ON fund_2.id = fundbenchmarklink_2.fundid) ON fundbenchmarklink_2.isactive = 1
            AND benchmark.id = fundbenchmarklink_2.benchmarkid
            AND fund.id = fundbenchmarklink_2.fundid']

上面的SQL清楚地表明,在尝试从中访问列之前,链接表没有被连接。

如果我查询数据库,特别是加入链接表:

query = session.query(Benchmark).join(FundBenchmarkLink, Fund, isouter=True)

它有效,但这意味着我现在必须确保每当我查询Benchmark表时,我总是要定义连接以添加两个额外的表。

我有什么遗漏,这是一个潜在的错误,还是只是图书馆的工作方式?

复制问题的完整工作示例代码:

import logging

logging.basicConfig(level=logging.INFO)
logging.getLogger('sqlalchemy.engine.base').setLevel(logging.INFO)

from sqlalchemy import Column, DateTime, String, Integer, Boolean, ForeignKey, create_engine
from sqlalchemy.orm import relationship, sessionmaker
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()


class FundBenchmarkLink(Base):
    __tablename__ = 'fundbenchmarklink'

    fundid = Column(Integer, ForeignKey('fund.id'), primary_key=True, autoincrement=False)
    benchmarkid = Column(Integer, ForeignKey('benchmark.id'), primary_key=True, autoincrement=False)
    isactive = Column(Boolean, nullable=False, default=True)

    fund = relationship('Fund')
    benchmark = relationship('Benchmark')

    def __repr__(self):
        return "<FundBenchmarkLink(fundid='{}', benchmarkid='{}', isactive='{}')>".format(self.fundid, self.benchmarkid, self.isactive)


class Benchmark(Base):
    __tablename__ = 'benchmark'

    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)

    funds = relationship('Fund', secondary='fundbenchmarklink', lazy='joined')

    # activefunds has additional filtering on the secondary table, requiring a primaryjoin statement.
    activefunds = relationship('Fund', secondary='fundbenchmarklink',
                               primaryjoin='and_(FundBenchmarkLink.isactive==True,'
                                           'Benchmark.id==FundBenchmarkLink.benchmarkid,'
                                           'Fund.id==FundBenchmarkLink.fundid)',
                               lazy='joined')

    def __repr__(self):
        return "<Benchmark(id='{}', name='{}')>".format(self.id, self.name)


class Fund(Base):
    __tablename__ = 'fund'

    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)

    def __repr__(self):
        return "<Fund(id='{}', name='{}')>".format(self.id, self.name)


if '__main__' == __name__:
    engine = create_engine('sqlite://')
    Base.metadata.create_all(engine)
    maker = sessionmaker(bind=engine)

    session = maker()

    # Create some data
    for bmkname in ['foo', 'bar', 'baz']:
        bmk = Benchmark(name=bmkname)
        session.add(bmk)

    for fname in ['fund1', 'fund2', 'fund3']:
        fnd = Fund(name=fname)
        session.add(fnd)

    session.add(FundBenchmarkLink(fundid=1, benchmarkid=1))
    session.add(FundBenchmarkLink(fundid=2, benchmarkid=1))
    session.add(FundBenchmarkLink(fundid=1, benchmarkid=2))
    session.add(FundBenchmarkLink(fundid=2, benchmarkid=2, isactive=False))

    session.commit()

    # This code snippet works when activefunds doesn't exist, or doesn't use eager loading
    # query = session.query(Benchmark)
    # print(query)

    # for bmk in query:
    #     print(bmk)
    #     for fund in bmk.funds:
    #         print('\t{}'.format(fund))

    # This code snippet works for activefunds with eager loading
    query = session.query(Benchmark).join(FundBenchmarkLink, Fund, isouter=True)
    print(query)

    for bmk in query:
        print(bmk)
        for fund in bmk.activefunds:
            print('\t{}'.format(fund))

1 个答案:

答案 0 :(得分:1)

我认为您已将primary joinsecondary join混合了一下。你的主要目前似乎包含两者。删除基金的谓词,它应该有效:

activefunds = relationship(
    'Fund',
    secondary='fundbenchmarklink',
    primaryjoin='and_(FundBenchmarkLink.isactive==True,'
                'Benchmark.id==FundBenchmarkLink.benchmarkid)',
    lazy='joined')

您的显式连接似乎修复了查询的原因是它在隐式预加载连接之前引入了表基金,因此它们可以引用它。它不是真正的修复,而是隐藏错误。如果您确实希望将明确的Query.join()用于eagerloading,请使用contains_eager()通知查询。请注意您选择包含哪种关系,具体取决于相关查询;如果没有额外的过滤,您也可以将activefunds填入非活动状态。

最后,请考虑使用Query.outerjoin()代替Query.join(..., isouter=True)