将关系的主要联接限制为一个

时间:2017-06-30 16:27:59

标签: python sqlalchemy

我有一个代表交付的SQLAlchemy模型;交货有目的地,地块ID和日期:

class Delivery(Base):
    delivery_id = Column(Integer, primary_key=True, autoincrement=True)
    parcel_id = Column(ForeignKey('parcels.parcel_id'))
    scheduled_date = Column(DateTime)
    destination_id = Column(ForeignKey('location.location_id'))

现在,交货的原产地等于同一地块的先前交货目的地。我没有通过维护基于指针的链接列表来对该信息进行非规范化,而是使用预定日期来订购交付,目前是这样的:

def origin(delivery):
    prior = session.query(Delivery)
           .filter(
                Delivery.parcel_id == delivery.parcel_id,
                Delivery.scheduled_date < delivery.scheduled_date,
           )
           .order_by(Delivery.scheduled_date.desc())
           .first()
    return prior.location_id if prior else None

在纯SQL中,我可以将这个单独的查询转换为我加载传递时包含的简单子查询+连接。我已经足够远,我可以加载所有在当前交付之前发生的相关交付:

_prior_delivery = \
    select([Delivery.parcel_id, Delivery.scheduled_date, Location]) \
        .where(and_(Location.location_id == remote(Delivery.location_id)) \
        .order_by(Delivery.scheduled_date.desc()) \
        .alias("prior_delivery")

Delivery.origin = relationship(
    Location,
    primaryjoin=and_(_prior_delivery.c.parcel_id == foreign(Delivery.parcel_id),
                     _prior_delivery.c.scheduled_date < foreign(Delivery.scheduled_date)),
    secondary=_prior_delivery,
    secondaryjoin=_prior_delivery.c.location_id == foreign(Location.location_id),
    uselist=False,
    viewonly=True)

由于uselist=False,这实际上有效;但在引擎盖下,它会返回当前发生的每一次交付; SQLAlchemy打印一个警告,结果集比它需要的大得多。

我的问题:如何将limit(1)应用于此只读关系?

1 个答案:

答案 0 :(得分:3)

首次尝试

这很困难的原因是关系需要能够加入主查询。 SQLAlchemy需要能够在同一查询中加载关系才能实现预先加载。问题是,如何编写一个单个查询来加载Delivery个列表以及每个origin s?

SELECT delivery.*, location.* FROM delivery
LEFT JOIN location ON location.location_id = (
  SELECT destination_id FROM delivery prior
  WHERE delivery.parcel_id = prior.parcel_id
  ORDER BY prior.scheduled_date DESC
  LIMIT 1
);

有效地,相关子查询

SELECT destination_id FROM delivery prior
WHERE delivery.parcel_id = prior.parcel_id
ORDER BY prior.scheduled_date DESC
LIMIT 1

成为计算外键origin_id,您可以在其上加入location表。将其翻译成SQLAlchemy,它将类似于:

delivery = Delivery.__table__
location = Location.__table__
prior = alias(delivery, "prior")
_origin_id = select([prior.c.destination_id])\
    .where(delivery.c.parcel_id == prior.c.parcel_id)\
    .order_by(prior.c.scheduled_date.desc())\
    .limit(1)
Delivery.origin = relationship(
    Location,
    primaryjoin=_origin_id == location.c.location_id,
    viewonly=True)

不幸的是,对于我尝试过的remoteforeign注释的所有组合,这似乎不起作用

SELECT与相关子查询一起使用为secondary

下一个最佳解决方案是使用假的辅助表:

SELECT delivery.*, location.* FROM delivery
LEFT JOIN (
  SELECT delivery.delivery_id, (
    SELECT destination_id FROM delivery prior
    WHERE delivery.parcel_id = prior.parcel_id
    ORDER BY prior.scheduled_date DESC
    LIMIT 1
  ) AS origin_id FROM delivery
) delivery_origin ON delivery.delivery_id = delivery_origin.delivery_id
LEFT JOIN location ON delivery_origin.origin_id = location.location_id;

在SQLAlchemy中,这是:

delivery = Delivery.__table__
location = Location.__table__
current = alias(delivery, "current")
prior = alias(delivery, "prior")
_origin_id = select([prior.c.destination_id])\
    .where(current.c.parcel_id == prior.c.parcel_id)\
    .order_by(prior.c.scheduled_date.desc())\
    .limit(1)\
    .label("origin_id")
delivery_origin = select([
    UnaryExpression(current.c.delivery_id, operator=custom_op("")).label("delivery_id"),
    _origin_id,
]).select_from(current)
Delivery.origin = relationship(
    Location,
    primaryjoin=delivery.c.delivery_id == foreign(delivery_origin.c.delivery_id),
    secondaryjoin=foreign(delivery_origin.c.origin_id) == location.c.location_id,
    secondary=delivery_origin,
    viewonly=True,
    uselist=False)

不幸的是,似乎存在一个错误(可能与this issue相关)导致SQLAlchemy发出错误的连接,所以我们需要应用一个小的黑客:

delivery = Delivery.__table__
location = Location.__table__
current = alias(delivery, "current")
prior = alias(delivery, "prior")

# HACK: wrap delivery_id in an empty unary operator
_delivery_id = UnaryExpression(current.c.delivery_id, operator=custom_op(""))\
    .label("delivery_id")
# /HACK

_origin_id = select([prior.c.destination_id])\
    .where(current.c.parcel_id == prior.c.parcel_id)\
    .order_by(prior.c.scheduled_date.desc())\
    .limit(1)\
    .label("origin_id")
delivery_origin = select([
    _delivery_id,
    _origin_id,
]).select_from(current)
Delivery.origin = relationship(
    Location,
    primaryjoin=delivery.c.delivery_id == foreign(delivery_origin.c.delivery_id),
    secondaryjoin=foreign(delivery_origin.c.origin_id) == location.c.location_id,
    secondary=delivery_origin,
    viewonly=True,
    uselist=False)

使用SELECT窗口函数为secondary

可能具有更好性能特征的替代实现是使用窗口函数:

SELECT delivery.*, location.* FROM delivery
LEFT JOIN (
  SELECT
    delivery.delivery_id,
    lag(delivery.delivery_id) OVER (PARTITION BY delivery.parcel_id ORDER BY delivery.scheduled_date) AS origin_id
  FROM delivery
) delivery_origin ON delivery.delivery_id = delivery_origin.delivery_id
LEFT JOIN location ON delivery_origin.origin_id = location.location_id;

和以前一样,我们需要应用类似的hack来让SQLAlchemy生成正确的SQL:

delivery = Delivery.__table__
location = Location.__table__
current = alias(delivery, "current")
prior = alias(delivery, "prior")

# HACK: wrap delivery_id in an empty unary operator
_delivery_id = UnaryExpression(current.c.delivery_id, operator=custom_op(""))\
    .label("delivery_id")
# /HACK

_origin_id = func.lag(current.c.delivery_id)\
    .over(partition_by=current.c.parcel_id,
          order_by=current.c.scheduled_date)\
    .label("origin_id")
delivery_origin = select([
    _delivery_id,
    _origin_id,
]).select_from(current)
Delivery.origin = relationship(
    Location,
    primaryjoin=delivery.c.delivery_id == foreign(delivery_origin.c.delivery_id),
    secondaryjoin=foreign(delivery_origin.c.origin_id) == location.c.location_id,
    secondary=delivery_origin,
    viewonly=True,
    uselist=False)