Question

我的PostgreSQL 11数据库count_tbl（Windows 10 x64计算机）中有以下数据。

grp     id     value
1       1      19.7
1       2      19.7
1       3      19.7
1       4      19.7
1       5      19.7
1       6      19.7
1       7      18.8
1       8      18.8
1       9      18.8
1       10     18.8
1       11     18.8
1       12     18.8
2       1      18.6
2       2      18.6
2       3      18.6
2       4      18.6
2       5      18.6
2       6      0.0
2       7      0.0
2       8      0.0
2       9      21.4
2       10     21.4
2       11     0.0
2       12     0.0

对于每个组（grp），以下查询将查找最频繁的值：

Select Distinct on (grp)
    grp,
    case
        when freq > 1
        then value
        else 0.0
    end as freq_val
From
(
    Select
        grp,
        value,
        count(id) as freq
    From count_tbl
    Group by grp, value
    Order by grp
) s1
Order by grp, freq desc, value desc;

以上查询的输出为：

grp    freq_val
1      19.7
2      18.6

现在，我想将最频繁的值与相应的ID相关联（例如，对于grp = 1，最频繁的值19.7具有ID 1、2、3、4、5、6 ），可以说是integer array。我的预期输出是：

grp    freq_val     ids
1      19.7         {1,2,3,4,5,6}
2      18.6         {1,2,3,4,5}

有人愿意反思一下还是建议如何实现？

Answer 1

您可以使用窗口功能和聚合：

select
    grp,
    value freq_val,
    array_agg(id) ids
from (
    select
        c.*,
        rank() over(partition by grp order by value desc) rn
    from count_tbl
) t
where rn = 1
group by grp, value

内部查询通过减少grp对具有相同value的记录进行排名。然后，外部查询过滤器将在每个组和聚合的最前面几行上进行

。

编辑：如果您希望每个组中出现最多的值（而不是最高值）及其关联的ID，则可以在子查询中进行汇总：

select *
from (
    select
        grp,
        value freq_val,
        array_agg(id) ids
        rank() over(partition by grp order by count(*) desc) rn
    from count_tbl
    group by grp, value
) t
where rn = 1

Answer 2

您可以使用distinct on。这很有趣，因为它不需要子查询：

select distinct on (grp) grp, value, count(*) as freq,
       array_agg(id) over (partition by grp) as ids
from count_tbl
group by grp, value
order by gp, count(*) desc, value desc;

大多数数据库都需要使用窗口函数或其他比较的子查询。 DISTINCT ON是Postgres扩展名。

如何获得与PostgreSQL中最频繁值关联的ID？

2 个答案: