我有一个代表交付的SQLAlchemy模型;交货有目的地,地块ID和日期:
class Delivery(Base):
delivery_id = Column(Integer, primary_key=True, autoincrement=True)
parcel_id = Column(ForeignKey('parcels.parcel_id'))
scheduled_date = Column(DateTime)
destination_id = Column(ForeignKey('location.location_id'))
现在,交货的原产地等于同一地块的先前交货目的地。我没有通过维护基于指针的链接列表来对该信息进行非规范化,而是使用预定日期来订购交付,目前是这样的:
def origin(delivery):
prior = session.query(Delivery)
.filter(
Delivery.parcel_id == delivery.parcel_id,
Delivery.scheduled_date < delivery.scheduled_date,
)
.order_by(Delivery.scheduled_date.desc())
.first()
return prior.location_id if prior else None
在纯SQL中,我可以将这个单独的查询转换为我加载传递时包含的简单子查询+连接。我已经足够远,我可以加载所有在当前交付之前发生的相关交付:
_prior_delivery = \
select([Delivery.parcel_id, Delivery.scheduled_date, Location]) \
.where(and_(Location.location_id == remote(Delivery.location_id)) \
.order_by(Delivery.scheduled_date.desc()) \
.alias("prior_delivery")
Delivery.origin = relationship(
Location,
primaryjoin=and_(_prior_delivery.c.parcel_id == foreign(Delivery.parcel_id),
_prior_delivery.c.scheduled_date < foreign(Delivery.scheduled_date)),
secondary=_prior_delivery,
secondaryjoin=_prior_delivery.c.location_id == foreign(Location.location_id),
uselist=False,
viewonly=True)
由于uselist=False
,这实际上有效;但在引擎盖下,它会返回当前发生的每一次交付; SQLAlchemy打印一个警告,结果集比它需要的大得多。
我的问题:如何将limit(1)
应用于此只读关系?
答案 0 :(得分:3)
这很困难的原因是关系需要能够加入主查询。 SQLAlchemy需要能够在同一查询中加载关系才能实现预先加载。问题是,如何编写一个单个查询来加载Delivery
个列表以及每个origin
s?
SELECT delivery.*, location.* FROM delivery
LEFT JOIN location ON location.location_id = (
SELECT destination_id FROM delivery prior
WHERE delivery.parcel_id = prior.parcel_id
ORDER BY prior.scheduled_date DESC
LIMIT 1
);
有效地,相关子查询
SELECT destination_id FROM delivery prior
WHERE delivery.parcel_id = prior.parcel_id
ORDER BY prior.scheduled_date DESC
LIMIT 1
成为计算外键origin_id
,您可以在其上加入location
表。将其翻译成SQLAlchemy,它将类似于:
delivery = Delivery.__table__
location = Location.__table__
prior = alias(delivery, "prior")
_origin_id = select([prior.c.destination_id])\
.where(delivery.c.parcel_id == prior.c.parcel_id)\
.order_by(prior.c.scheduled_date.desc())\
.limit(1)
Delivery.origin = relationship(
Location,
primaryjoin=_origin_id == location.c.location_id,
viewonly=True)
不幸的是,对于我尝试过的remote
和foreign
注释的所有组合,这似乎不起作用。
SELECT
与相关子查询一起使用为secondary
下一个最佳解决方案是使用假的辅助表:
SELECT delivery.*, location.* FROM delivery
LEFT JOIN (
SELECT delivery.delivery_id, (
SELECT destination_id FROM delivery prior
WHERE delivery.parcel_id = prior.parcel_id
ORDER BY prior.scheduled_date DESC
LIMIT 1
) AS origin_id FROM delivery
) delivery_origin ON delivery.delivery_id = delivery_origin.delivery_id
LEFT JOIN location ON delivery_origin.origin_id = location.location_id;
在SQLAlchemy中,这是:
delivery = Delivery.__table__
location = Location.__table__
current = alias(delivery, "current")
prior = alias(delivery, "prior")
_origin_id = select([prior.c.destination_id])\
.where(current.c.parcel_id == prior.c.parcel_id)\
.order_by(prior.c.scheduled_date.desc())\
.limit(1)\
.label("origin_id")
delivery_origin = select([
UnaryExpression(current.c.delivery_id, operator=custom_op("")).label("delivery_id"),
_origin_id,
]).select_from(current)
Delivery.origin = relationship(
Location,
primaryjoin=delivery.c.delivery_id == foreign(delivery_origin.c.delivery_id),
secondaryjoin=foreign(delivery_origin.c.origin_id) == location.c.location_id,
secondary=delivery_origin,
viewonly=True,
uselist=False)
不幸的是,似乎存在一个错误(可能与this issue相关)导致SQLAlchemy发出错误的连接,所以我们需要应用一个小的黑客:
delivery = Delivery.__table__
location = Location.__table__
current = alias(delivery, "current")
prior = alias(delivery, "prior")
# HACK: wrap delivery_id in an empty unary operator
_delivery_id = UnaryExpression(current.c.delivery_id, operator=custom_op(""))\
.label("delivery_id")
# /HACK
_origin_id = select([prior.c.destination_id])\
.where(current.c.parcel_id == prior.c.parcel_id)\
.order_by(prior.c.scheduled_date.desc())\
.limit(1)\
.label("origin_id")
delivery_origin = select([
_delivery_id,
_origin_id,
]).select_from(current)
Delivery.origin = relationship(
Location,
primaryjoin=delivery.c.delivery_id == foreign(delivery_origin.c.delivery_id),
secondaryjoin=foreign(delivery_origin.c.origin_id) == location.c.location_id,
secondary=delivery_origin,
viewonly=True,
uselist=False)
SELECT
窗口函数为secondary
可能具有更好性能特征的替代实现是使用窗口函数:
SELECT delivery.*, location.* FROM delivery
LEFT JOIN (
SELECT
delivery.delivery_id,
lag(delivery.delivery_id) OVER (PARTITION BY delivery.parcel_id ORDER BY delivery.scheduled_date) AS origin_id
FROM delivery
) delivery_origin ON delivery.delivery_id = delivery_origin.delivery_id
LEFT JOIN location ON delivery_origin.origin_id = location.location_id;
和以前一样,我们需要应用类似的hack来让SQLAlchemy生成正确的SQL:
delivery = Delivery.__table__
location = Location.__table__
current = alias(delivery, "current")
prior = alias(delivery, "prior")
# HACK: wrap delivery_id in an empty unary operator
_delivery_id = UnaryExpression(current.c.delivery_id, operator=custom_op(""))\
.label("delivery_id")
# /HACK
_origin_id = func.lag(current.c.delivery_id)\
.over(partition_by=current.c.parcel_id,
order_by=current.c.scheduled_date)\
.label("origin_id")
delivery_origin = select([
_delivery_id,
_origin_id,
]).select_from(current)
Delivery.origin = relationship(
Location,
primaryjoin=delivery.c.delivery_id == foreign(delivery_origin.c.delivery_id),
secondaryjoin=foreign(delivery_origin.c.origin_id) == location.c.location_id,
secondary=delivery_origin,
viewonly=True,
uselist=False)