如何使用PostgreSQL的DISTINCT ON子句还返回重复项的计数?

时间:2018-11-29 18:40:39

标签: sql database postgresql

假设我有一个这样的表

+--------+--------+------+--------+---------+
|   A    |   B    |  C   |   g    |    h    |
+--------+--------+------+--------+---------+
| cat    | dog    | bird | 34.223 |  54.223 |
| cat    | pigeon | goat |  23.23 |  54.948 |
| cat    | dog    | bird | 17.386 |  26.398 |
| gopher | pigeon | bird | 23.552 |  89.223 |
+--------+--------+------+--------+---------+

但右边还有更多字段(i,j,k,...)。

我需要一个结果表,如下所示:

+-----+--------+------+-----+-----+-----+-----+-------+
|  A  |   B    |  C   |  g  |  h  | ... |  z  | count |
+-----+--------+------+-----+-----+-----+-----+-------+
| cat | dog    | bird | xxx | xxx |     | xxx |    23 |
| cat | pigeon | goat | xxx | xxx |     | xxx |    78 |
+-----+--------+------+-----+-----+-----+-----+-------+

我通常会使用GROUP BY,但是我不想重复所有的列名(g,h,i,... z)。

我目前可以通过将窗口函数与DISTINCT ON结合使用来获得所需的结果,但是查询的运行速度非常慢(超过500k条记录),并且重复项很多

WITH temp AS (
    SELECT a, b, c, COUNT(*)
    FROM my_table
    GROUP BY a, b, C
)
SELECT DISTINCT ON (a, b, c) *, (
    SELECT count
    FROM temp
    WHERE 
        temp.a = t.a 
        AND temp.b = t.b 
        AND temp.c = t.c
) as count
FROM my_table as t
ORDER BY a, b, c, x, y;

是否有办法以某种更有效的方式获得用DISTINCT消除的行数?像

SELECT DISTINCT ON (a, b, c)
    *, COUNT(*)
FROM my_table
ORDER BY a, b, c, count;

还是我采取了错误的方法?

1 个答案:

答案 0 :(得分:2)

COUNT()PARTITION BY一起使用:

SELECT DISTINCT ON (a, b, c) *, COUNT(*) OVER (PARTITION BY a, b, c)
FROM my_table

如果您完全关心其余字段,则可能还应该在查询中添加ORDER,否则用于获取这些字段中显示的数据的行可能会不一致。