渴望在SQLAlchemy中使用显式自联接和contains_eager加载分层子级

时间:2013-12-16 20:36:52

标签: sqlalchemy

考虑到以下关系:

- 1 MasterProduct parent -> many MasterProduct children
- 1 MasterProduct child -> many StoreProducts
- 1 StoreProduct -> 1 Store

我在SQLAlchemy中定义了以下声明性模型:

class MasterProduct(Base):
    __tablename__ = 'master_products'
    id = Column(Integer, primary_key=True)
    pid = Column(Integer, ForeignKey('master_products.id'))
    children = relationship('MasterProduct', join_depth=1,
                            backref=backref('parent', remote_side=[id]))
    store_products = relationship('StoreProduct', backref='master_product')


class StoreProduct(Base):
    __tablename__ = 'store_products'
    id = Column(Integer, primary_key=True)
    mid = Column(Integer, ForeignKey('master_products.id'))
    sid = Column(Integer, ForeignKey('stores.id'))
    timestamp = Column(DateTime)
    store = relationship('Store', uselist=False)


class Store(Base):
    __tablename__ = 'stores'
    id = Column(Integer, primary_key=True)

我的目标是在SQLAlchemy中使用预先加载复制以下查询:

SELECT *
FROM master_products mp_parent
INNER JOIN master_products mp_child ON mp_child.pid = mp_parent.id
INNER JOIN store_products sp1 ON sp1.mid = mp_child.id
LEFT JOIN store_products sp2
  ON sp1.mid = sp2.mid AND sp1.sid = sp2.sid AND sp1.timestamp < sp2.timestamp
WHERE mp_parent.id = 6752 AND sp2.id IS NULL

查询选择父级6752和所有的所有MasterProduct子级 使用NULL按最新时间戳分组的相应商店产品 自我加入(每组最大n)。有82家商店产品退回 查询,有14个主要产品子女。

我尝试过以下无效:

mp_child = aliased(MasterProduct)
sp1 = aliased(StoreProduct)
sp2 = aliased(StoreProduct)

q = db.session.query(MasterProduct).filter_by(id=6752) \ 
    .join(mp_child, MasterProduct.children) \
    .join(sp1, mp_child.store_products) \
    .outerjoin(sp2, and_(sp1.mid == sp2.mid, sp1.sid == sp2.sid, sp1.timestamp < sp2.timestamp)) \
    .filter(sp2.id == None) \
    .options(contains_eager(MasterProduct.children, alias=mp_child),
             contains_eager(MasterProduct.children, mp_child.store_products, alias=sp1))

>>> mp_parent = q.first()  # the query below looks ok!
SELECT <all columns from master_products, master_products_1, and store_products_1>
FROM master_products INNER JOIN master_products AS master_products_1 ON master_products.id = master_products_1.pid INNER JOIN store_products AS store_products_1 ON master_products_1.id = store_products_1.mid LEFT OUTER JOIN store_products AS store_products_2 ON store_products_1.mid = store_products_2.mid AND store_products_1.sid = store_products_2.sid AND store_products_1.timestamp < store_products_2.timestamp 
WHERE master_products.id = %s AND store_products_2.id IS NULL 
 LIMIT %s
>>> mp_parent.children  # only *one* child is eagerly loaded (expected 14)
[<app.models.MasterProduct object at 0x2463850>]
>>> mp_parent.children[0].id  # this is correct, 6762 is one of the children
6762L
>>> mp_parent.children[0].pid  # this is correct
6752L
>>> mp_parent.children[0].store_products  # only *one* store product is eagerly loaded (expected 7 for this child)
[<app.models.StoreProduct object at 0x24543d0>]

退后一步并简化查询以急切加载孩子 也导致只有1个孩子被急切地加载而不是所有14个孩子:

mp_child = aliased(MasterProduct)
q = db.session.query(MasterProduct).filter_by(id=6752) \ 
        .join(mp_child, MasterProduct.children)
        .options(contains_eager(MasterProduct.children, alias=mp_child))

但是,当我使用joinedloadjoinedload_allsubqueryload时,所有 14名儿童急切地负担,即:

q = db.session.query(MasterProduct).filter_by(id=6752) \ 
        .options(joinedload_all('children.store_products', innerjoin=True))

所以问题似乎是填充MasterProduct.children 使用contains_eager进行显式连接。

任何人都可以用我的方式发现错误或帮助我指出正确的方向吗?

1 个答案:

答案 0 :(得分:3)

确定您在SQL中可能会观察到的是“LIMIT 1”即将发布。那是因为你正在使用first()。我们可以比较前两个查询,包含eager和joinedload:

join()+ contains_eager():

SELECT master_products_1.id AS master_products_1_id, master_products_1.pid AS master_products_1_pid, master_products.id AS master_products_id, master_products.pid AS master_products_pid 
FROM master_products JOIN master_products AS master_products_1 ON master_products.id = master_products_1.pid 
WHERE master_products.id = ?
 LIMIT ? OFFSET ?

joinedload():

SELECT anon_1.master_products_id AS anon_1_master_products_id, anon_1.master_products_pid AS anon_1_master_products_pid, master_products_1.id AS master_products_1_id, master_products_1.pid AS master_products_1_pid 
FROM (SELECT master_products.id AS master_products_id, master_products.pid AS master_products_pid 
FROM master_products 
WHERE master_products.id = ?
 LIMIT ? OFFSET ?) AS anon_1 JOIN master_products AS master_products_1 ON anon_1.master_products_id = master_products_1.pid

你可以看到第二个查询是完全不同的;因为first()表示应用了LIMIT,joinload()知道在子查询中包装“criteria”查询,对其应用限制,然后应用JOIN。在join + contains_eager情况下,LIMIT将应用于集合本身,并且您获得的行数错误。

只需将底部的脚本更改为:

for q, query_label in queries:
    mp_parent = q.all()[0]

我得到你说你期待的输出:

[explicit join with contains_eager] children=3, store_products=27
[joinedload] children=3, store_products=27
[joinedload_all] children=3, store_products=27
[subqueryload] children=3, store_products=27
[subqueryload_all] children=3, store_products=27
[explicit joins with contains_eager, filtered by left-join] children=3, store_products=9

(这就是为什么获取用户创建的示例非常重要)