从混合属性生成的低效相关子查询

时间:2015-10-14 20:59:56

标签: orm sqlalchemy

我有简单的数据库模型

class Account(ModelBase):
    name = Column(String(127), nullable=False)

    @hybrid_property
    def balance(self):
        return sum(imap(operator.attrgetter("amount"), self.transactions))

    @balance.expression
    def balance(self):
        return select(
            [func.sum(Transaction.amount)]
        ).where(
            Transaction.account_id == self.id
        ).label("balance")


class Transaction(ModelBase):
    amount = Column(Integer, nullable=False)

    account_id = Column(Integer,
                        ForeignKey(Account.id, ondelete='CASCADE'),
                        nullable=False)
    account = relationship(Account,
                           backref=backref("transactions",
                                           cascade="all, delete-orphan"))

如果我然后执行以下查询

session.query(Account).filter(Account.balance < 0)

它生成SQL查询

SELECT account.id AS account_id, account.name AS account_name
FROM account 
WHERE (SELECT sum(transaction.amount) AS sum_1 
FROM transaction 
WHERE transaction.account_id = account.id) < 0

在PostgreSQL 9.1上导致以下效率极低的查询计划:

 Seq Scan on account  (cost=0.00..36631458.18 rows=3543 width=38)
   Filter: ((SubPlan 1) < 0)
   SubPlan 1
     ->  Aggregate  (cost=3446.66..3446.67 rows=1 width=4)
           ->  Seq Scan on transaction  (cost=0.00..3446.60 rows=25 width=4)
                 Filter: (account_id = account.id)

虽然我想更喜欢

SELECT account.id AS account_id, account.name as account_name
FROM account JOIN transaction on transaction.account_id = account.id
GROUP BY account.id
HAVING sum(transaction.amount) < 0;

这将导致更快的查询计划

 GroupAggregate  (cost=22488.79..27353.75 rows=10628 width=38)
   Filter: (sum(transaction.amount) > 0)
   ->  Merge Join  (cost=22488.79..26258.66 rows=192448 width=38)
         Merge Cond: (account.id = transaction.account_id)
         ->  Index Scan using account_pkey on account  (cost=0.00..376.68 rows=10628 width=34)
         ->  Materialize  (cost=22488.75..23450.99 rows=192448 width=8)
               ->  Sort  (cost=22488.75..22969.87 rows=192448 width=8)
                     Sort Key: transaction.account_id
                     ->  Seq Scan on transaction  (cost=0.00..2965.48 rows=192448 width=8)

我如何才能提高简单混合查询的效率,因此我不必明确撰写更有效的查询?我是否必须使用transformers或者是否有更简单的方法?

自包含的例子:

#!/usr/bin/env python2

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.hybrid import hybrid_property
from sqlalchemy.orm import sessionmaker, relationship, backref
from sqlalchemy import create_engine, Column, Integer, String, ForeignKey, select, func

Base = declarative_base()

class Account(Base):
    __tablename__ = 'account'
    id = Column(Integer, primary_key=True)
    name = Column(String(127), nullable=False)

    @hybrid_property
    def balance(self):
        return sum(imap(operator.attrgetter("amount"), self.transactions))

    @balance.expression
    def balance(self):
        return select(
            [func.sum(Transaction.amount)]
        ).where(
            Transaction.account_id == self.id
        ).label("balance")


class Transaction(Base):
    __tablename__ = 'transaction'
    id = Column(Integer, primary_key=True)

    amount = Column(Integer, nullable=False)

    account_id = Column(Integer,
                        ForeignKey(Account.id, ondelete='CASCADE'),
                        nullable=False)
    account = relationship(Account,
                           backref=backref("transactions",
                                           cascade="all, delete-orphan"))


engine = create_engine('sqlite:///:memory:', echo=True)
Session = sessionmaker(bind=engine)
session = Session()

print '\n{:=^80}'.format('actual query')  
print session.query(Account).filter(Account.balance > 0)

print '\n{:=^80}'.format('desired query') 
print session.query(Account).join(Transaction).group_by(Account).having(func.sum(Transaction.amount) > 0)

0 个答案:

没有答案