如何在SqlAlchemy中没有JOIN的情况下嵌套SELECT?

时间:2019-06-10 11:53:57

标签: python postgresql sqlalchemy

我有一个Postgres查询(通过SQLAlchemy),它使用复杂的条件选择匹配的行:

original_query = session.query(SomeTable).filter(*complex_filters)

我不确定查询的构造方式,我只能访问生成的Query实例。

现在,我想使用此“不透明”查询(出于此问题的目的,用黑框表示)来使用相同的条件从同一张表构造其他查询,但在匹配的{{ 1}}行。例如,original_query位于顶部:

SELECT DISTINCT(column)

another_query = session.query(SomeTable.column).distinct().?select_from_query?(original_query)

SELECT SUM(tab_value) FROM (
    SELECT tab.key AS tab_key, tab.value AS tab_value -- inner query, fixed
    FROM tab
    WHERE tab.product_id IN (1, 2)  -- simplified; the inner query is quite complex
) AS tbl
WHERE tab_key = 'length';

如何在SQLAlchemy中干净地实现SELECT tab_key, COUNT(*) FROM ( SELECT tab.key AS tab_key, tab.value AS tab_value FROM tab WHERE tab.product_id IN (1, 2) ) AS tbl GROUP BY tab_key; 部分? 基本上,如何在SqlAlchemy中进行?select_from_query?


动机:内部Query对象来自代码的不同部分。我无法控制它的构造方式,并且想要避免为我必须在其上运行的每个SELECT dynamic FROM (SELECT fixed)重复其逻辑。我想重用该查询,但要在上面添加其他逻辑(按照上面的示例)。

1 个答案:

答案 0 :(得分:2)

original_query仅仅是SQLAlchemy query API object,您可以对此应用其他过滤器和条件。查询API是 generative ;每个Query()实例操作都会返回一个新的(不可变的)实例,并且您的起点(original_query)不受影响。

这包括使用Query.distinct()添加一个DISTINCT()子句,使用Query.with_entities()更改哪些列是查询的一部分,以及使用Query.values()执行查询,但只返回特定的单列值。

使用.distinct(<column>).with_entities(<column>)创建一个新的查询对象(可以进一步重复使用):

another_query = original_query.distinct(SomeTable.column).with_entities(SomeTable.column)

或仅使用.distinct(<column>).values(<column>)来获得(column_value,)元组结果的迭代器,然后:

distinct_values = original_query.distinct(SomeTable.column).values(SomeTable.column)

请注意,.values().all()一样立即执行查询,而.with_entities()则只返回一列(而{{1 }}或迭代或切片将执行并返回结果。

演示,使用人为的Query模型(针对sqlite执行以使其更易于快速演示):

.all()

在上面的演示中,原始查询将使用Foo过滤器选择某些>>> from sqlalchemy import * >>> from sqlalchemy.ext.declarative import declarative_base >>> from sqlalchemy.orm import sessionmaker >>> Base = declarative_base() >>> class Foo(Base): ... __tablename__ = "foo" ... id = Column(Integer, primary_key=True) ... bar = Column(String) ... spam = Column(String) ... >>> engine = create_engine('sqlite:///:memory:', echo=True) >>> session = sessionmaker(bind=engine)() >>> Base.metadata.create_all(engine) 2019-06-10 13:10:43,910 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("foo") 2019-06-10 13:10:43,910 INFO sqlalchemy.engine.base.Engine () 2019-06-10 13:10:43,911 INFO sqlalchemy.engine.base.Engine CREATE TABLE foo ( id INTEGER NOT NULL, bar VARCHAR, spam VARCHAR, PRIMARY KEY (id) ) 2019-06-10 13:10:43,911 INFO sqlalchemy.engine.base.Engine () 2019-06-10 13:10:43,913 INFO sqlalchemy.engine.base.Engine COMMIT >>> original_query = session.query(Foo).filter(Foo.id.between(17, 42)) >>> print(original_query) # show what SQL would be executed for this query SELECT foo.id AS foo_id, foo.bar AS foo_bar, foo.spam AS foo_spam FROM foo WHERE foo.id BETWEEN ? AND ? >>> another_query = original_query.distinct(Foo.bar).with_entities(Foo.bar) >>> print(another_query) # print the SQL again, don't execute SELECT DISTINCT foo.bar AS foo_bar FROM foo WHERE foo.id BETWEEN ? AND ? >>> distinct_values = original_query.distinct(Foo.bar).values(Foo.bar) # executes! 2019-06-10 13:10:48,470 INFO sqlalchemy.engine.base.Engine SELECT DISTINCT foo.bar AS foo_bar FROM foo WHERE foo.id BETWEEN ? AND ? 2019-06-10 13:10:48,470 INFO sqlalchemy.engine.base.Engine (17, 42) 实例,但是添加Foo然后对 just 执行查询BETWEEN列,但使用相同的.distinct(Foo.bar).values(Foo.bar)过滤器。同样,通过使用DISTINCT foo.bar,我们仅针对该单列获得了一个新的查询对象,但过滤器仍是该新查询的一部分。

您添加的示例的工作方式相同;您实际上不需要在此进行子选择,因为相同的查询可以表示为:

BETWEEN

只需添加额外的过滤器,然后使用.with_entities()SELECT sum(tab.value) FROM tab WHERE tab.product_id IN (1, 2) AND tab_key = 'length'; 替换所选列即可实现:

.with_entities()

,或者就以上SUM()演示而言:

summed_query = (
    original_query
    .filter(Tab.key == 'length')  # add a filter
    .with_entities(func.sum(Tab.value)

存在子查询的用例(例如,限制联接中特定表的结果),但这不是其中之一。

如果您确实需要子查询,则查询API具有Query.from_self()(对于简单情况)和Query.subselect()

例如,如果您只需要从原始查询中选择汇总行,然后通过Foo过滤汇总值,然后将结果与另一个表合并,以获得每个组的最高行ID,并进一步过滤,那么您需要一个子查询:

>>> print(original_query.filter(Foo.spam == 42).with_entities(func.sum(Foo.bar)))
SELECT sum(foo.bar) AS sum_1
FROM foo
WHERE foo.id BETWEEN ? AND ? AND foo.spam = ?

上面的方法只会选择求和的HAVING值,该值大于10,并且每个组中最高的summed_col = func.sum(SomeTable.some_column) max_id = func.max(SomeTable.primary_key) summed_results_by_eggs = ( original_query .with_entities(max_id, summed_col) # only select highest id and the sum .group_by(SomeTable.other_column) # per group .having(summed_col > 10) # where the sum is high enough .from_self(summed_col) # give us the summed value as a subselect .join( # join these rows with another table OtherTable, OtherTable.foreign_key == max_id # using the highest id ) .filter(OtherTable.some_column < 1000) # and filter some more ) 值。此查询必须使用子查询,因为您要在对另一个表进行联接之前限制合格的SomeTable.some_column行。

要进行演示,我添加了第二个表SomeTable.id

SomeTable

Eggs方法将新的实体用于外部查询,如果忽略这些实体,则所有列都将被拉出。在上面,我取出了汇总列值;如果没有该参数,也将选择>>> from sqlalchemy.orm import relationship >>> class Eggs(Base): ... __tablename__ = "eggs" ... id = Column(Integer, primary_key=True) ... foo_id = Column(Integer, ForeignKey(Foo.id)) ... foo = relationship(Foo, backref="eggs") ... >>> summed_col = func.sum(Foo.bar) >>> max_id = func.max(Foo.id) >>> print( ... original_query ... .with_entities(max_id, summed_col) ... .group_by(Foo.spam) ... .having(summed_col > 10) ... .from_self(summed_col) ... .join(Eggs, Eggs.foo_id==max_id) ... .filter(Eggs.id < 1000) ... ) SELECT anon_1.sum_2 AS sum_1 FROM (SELECT max(foo.id) AS max_1, sum(foo.bar) AS sum_2 FROM foo WHERE foo.id BETWEEN ? AND ? GROUP BY foo.spam HAVING sum(foo.bar) > ?) AS anon_1 JOIN eggs ON eggs.foo_id = anon_1.max_1 WHERE eggs.id < ? 列。