我有一个查询,它从my_table中选择最重复的值。查询如下:
SELECT
gid,
max_height
FROM
(
SELECT gid, max_height,
ROW_NUMBER() OVER (PARTITION BY gid ORDER BY freq DESC) AS rn
FROM (
SELECT gid, max_height, COUNT(id) AS freq
FROM my_table
GROUP BY 1, 2
order by 1,2
) hgt_freq
) ranked_hgt_req
WHERE rn = 1
while,my_table包含三列,如:
gid id max_height
3 1 19.3
3 2 19.3
3 3 20.3
3 4 20.3
3 5 19.3
3 6 19.3
3 7 21.4
3 8 21.4
3 9 21.4
3 10 21.4
3 11 21.4
3 12 21.4
22 1 23.1
22 2 23.1
22 3 23.1
22 4 23.1
22 5 23.1
22 6 23.1
22 7 22.1
22 8 22.1
22 9 22.1
22 10 22.1
22 11 22.1
22 12 22.1
29 1 24
29 2 24
29 3 24
29 4 18.9
29 5 18.9
29 6 18.9
29 7 NULL
29 8 NULL
29 9 27.1
29 10 27.1
29 11 6.5
29 12 6.5
此查询的问题在于它以降序返回最重复的值,这为gid = 22的情况提供了错误的值。查询的输出为:
gid max_height
3 21.4
22 22.1
29 24.0
对于gid = 22的情况,有两个最重复的值,即23.1和22.1。因此查询应返回23.1。任何人都可以指出我如何解决这个问题,还是有更好的方法来做到这一点?该过程需要自动化大型记录(gids)。
答案 0 :(得分:2)
使用distinct on
:
select distinct on(gid) gid, max_height
from (
select gid, max_height, count(id) as freq
from my_table
group by 1, 2
) s
order by gid, freq desc
gid | max_height
-----+------------
3 | 21.4
22 | 23.1
29 | 24
(3 rows)
SELECT DISTINCT ON(expression [,...])仅保留给定表达式求值的每组行的第一行。使用与ORDER BY相同的规则解释DISTINCT ON表达式(参见上文)。请注意,每个集合的“第一行”是不可预测的,除非使用ORDER BY来确保首先出现所需的行。
gid=29
有两个最常见的值。在这种情况下,您可以通过在order by
中添加一个条件来选择应该呈现的顺序:
select distinct on(gid) gid, max_height
from (
select gid, max_height, count(id) as freq
from my_table
group by 1, 2
) s
order by gid, freq desc, max_height desc;
gid | max_height
-----+------------
3 | 21.4
22 | 23.1
29 | 24
(3 rows)
select distinct on(gid) gid, max_height
from (
select gid, max_height, count(id) as freq
from my_table
group by 1, 2
) s
order by gid, freq desc, max_height;
gid | max_height
-----+------------
3 | 21.4
22 | 22.1
29 | 18.9
(3 rows)