如何在PostgreSQL中修复错误最重复的值

时间:2017-06-09 21:36:21

标签: sql postgresql

我有一个查询,它从my_table中选择最重复的值。查询如下:

SELECT
gid,
    max_height
    FROM
    (
    SELECT gid, max_height,
    ROW_NUMBER() OVER (PARTITION BY gid ORDER BY freq DESC) AS rn
    FROM (
            SELECT gid, max_height, COUNT(id) AS freq
            FROM my_table
            GROUP BY 1, 2
    order by 1,2
    ) hgt_freq
    ) ranked_hgt_req
WHERE rn = 1

while,my_table包含三列,如:

gid id  max_height
3   1   19.3
3   2   19.3
3   3   20.3
3   4   20.3
3   5   19.3
3   6   19.3
3   7   21.4
3   8   21.4
3   9   21.4
3   10  21.4
3   11  21.4
3   12  21.4
22  1   23.1
22  2   23.1
22  3   23.1
22  4   23.1
22  5   23.1
22  6   23.1
22  7   22.1
22  8   22.1
22  9   22.1
22  10  22.1
22  11  22.1
22  12  22.1
29  1   24
29  2   24
29  3   24
29  4   18.9
29  5   18.9
29  6   18.9
29  7   NULL
29  8   NULL
29  9   27.1
29  10  27.1
29  11  6.5
29  12  6.5

此查询的问题在于它以降序返回最重复的值,这为gid = 22的情况提供了错误的值。查询的输出为:

gid    max_height
3      21.4
22     22.1
29     24.0

对于gid = 22的情况,有两个最重复的值,即23.1和22.1。因此查询应返回23.1。任何人都可以指出我如何解决这个问题,还是有更好的方法来做到这一点?该过程需要自动化大型记录(gids)。

1 个答案:

答案 0 :(得分:2)

使用distinct on

select distinct on(gid) gid, max_height
from (
    select gid, max_height, count(id) as freq
    from my_table
    group by 1, 2
    ) s
order by gid, freq desc

 gid | max_height 
-----+------------
   3 |       21.4
  22 |       23.1
  29 |         24
(3 rows)

来自the documentation:

  

SELECT DISTINCT ON(expression [,...])仅保留给定表达式求值的每组行的第一行。使用与ORDER BY相同的规则解释DISTINCT ON表达式(参见上文)。请注意,每个集合的“第一行”是不可预测的,除非使用ORDER BY来确保首先出现所需的行。

gid=29有两个最常见的值。在这种情况下,您可以通过在order by中添加一个条件来选择应该呈现的顺序:

select distinct on(gid) gid, max_height
from (
    select gid, max_height, count(id) as freq
    from my_table
    group by 1, 2
    ) s
order by gid, freq desc, max_height desc;

 gid | max_height 
-----+------------
   3 |       21.4
  22 |       23.1
  29 |         24
(3 rows)    

select distinct on(gid) gid, max_height
from (
    select gid, max_height, count(id) as freq
    from my_table
    group by 1, 2
    ) s
order by gid, freq desc, max_height;

 gid | max_height 
-----+------------
   3 |       21.4
  22 |       22.1
  29 |       18.9
(3 rows)