SQLAlchemy过滤掉多列之间不同的行

时间:2015-02-20 02:46:43

标签: python postgresql sqlalchemy

说我在桌子上有数据:

id | other_id | category | amount
--------------------------------
1  | abc      | widget   | 100
2  | abc      | widget   | 200
3  | def      | widget   | 100
4  | ghi      | gadget   | 100
5  | ghi      | gadget   | 100
6  | jkl      | gadget   | 100
7  | jkl      | gadget   | 100

我想查询此表以返回

(other_id, category, sum_of_amount)

其中sum_of_amount是同一amount中所有行的other_id列的总和。除此之外,我还要排除categorysum_of_amount组合不唯一的元组。所以查询应该返回元组:

(abc, widget, 300)
(def, widget, 100)

而不是任何gadget行,因为组合(gadget, 200)不是唯一的。

到目前为止,我有这个:

with session_scope() as db_session:
  query = db_session.query(
    ModelClass.other_id,
    ModelClass.category,
    label('sum_of_amount', func.sum(ModelClass.amount))
  ).group_by(
    ModelClass.other_id,
    ModelClass.category
  )

此查询未过滤掉任何内容。我想我需要以某种方式使用不同但我无法弄清楚。

1 个答案:

答案 0 :(得分:2)

您可以使用您记下的查询::

生成(ohter_id, category, sum_of_amount)的结果集
=# SELECT other_id, category, SUM(amount) AS sum_of_amount
FROM test
GROUP BY other_id, category;

 other_id │ category │ sum_of_amount
──────────┼──────────┼───────────────
 abc      │ widget   │           300
 ghi      │ gadget   │           200
 jkl      │ gadget   │           200
 def      │ widget   │           100
(4 rows)

然后,您必须排除(category, sum_of_amount)不唯一的行。在上面的结果集中确定每一行'唯一性,您可以添加新列,其中包含具有相同(category, sum_of_amount)的行数,如下所示:

=# SELECT other_id, category, SUM(amount) AS sum_of_amount,
COUNT(*) OVER (PARTITION BY category, SUM(amount)) AS duplicates
FROM test
GROUP BY other_id, category;

 other_id │ category │ sum_of_amount │ duplicates
──────────┼──────────┼───────────────┼───────
 ghi      │ gadget   │           200 │     2
 jkl      │ gadget   │           200 │     2
 def      │ widget   │           100 │     1
 abc      │ widget   │           300 │     1
(4 rows)

正如您在上面的演示中所看到的,您手中有决定因素。现在,您可以使用WHERE列添加duplicates子句来生成要查找的结果集。由于窗口函数(OVER列的duplicates部分)在WHERE子句中是不允许的,并且我们必须在计算了金额之后评估窗口函数的结果,我们必须使用子查询。

=# SELECT other_id, category, sum_of_amount
FROM (
  SELECT other_id, category, SUM(amount) AS sum_of_amount,
  COUNT(*) OVER (PARTITION BY category, SUM(amount)) AS duplicates
  FROM test
  GROUP BY other_id, category
) d
WHERE duplicates = 1
ORDER BY other_id, category;

 other_id │ category │ sum_of_amount
──────────┼──────────┼───────────────
 abc      │ widget   │           300
 def      │ widget   │           100
(2 rows)

上述SQL的SQLAlchemy查询表达式可以是:

from sqlalchemy import func, over
sum_of_amount = label('sum_of_amount', func.sum(ModelClass.amount))
duplicates = over(func.count('*'),
                  partition_by=(ModelClass.category, sum_of_amount))
query = db_session.query(
    ModelClass.other_id,
    ModelClass.category,
    sum_of_amount,
    duplicates,
  ).group_by(
    ModelClass.other_id,
    ModelClass.category
  ).from_self(
    ModelClass.other_id,
    ModelClass.category,
    sum_of_amount
  ).filter(duplicates == 1)