SQL - 按范围对值进行分组

时间:2021-02-02 06:56:46

标签: sql postgresql group-by grouping

我有以下查询:

SELECT
polutionmm2 AS metric,
sum(cnt) as value
FROM polutiondistributionstatistic as p inner join crates as c on p.crateid = c.id
WHERE
c.name = '154' 
and to_timestamp(startts) >= '2021/01/20 00:00:00' group by polutionmm2

此查询返回以下值:

"metric","value"
50,580
100,8262
150,1548
200,6358
250,869
300,3780
350,505
400,2248
450,318
500,1674
550,312
600,7420
650,1304
700,2445
750,486
800,985
850,139
900,661
950,99
1000,550

我需要以一种方式编辑查询,将它们分组到 100 的范围内,从 0 开始。因此,度量值介于 0 和 99 之间的所有内容都应该是一行,并且该值是行...像这样:

"metric","value"
0,580
100,9810
200,7227
300,4285
400,2556
500,1986
600,8724
700,2931
800,1124
900,760
1000,550

查询将运行大约 500.000 行。这可以通过查询完成吗?效率高吗?

编辑:

最多可以有 500 个范围,因此自动对它们进行分组会很棒。

3 个答案:

答案 0 :(得分:3)

您可以使用 generate_series()range type 来生成您想要的范围,例如:

select int4range(x.start, case when x.start = 1000 then null else x.start + 100 end, '[)') as range
from generate_series(0,1000,100) as x(start)

这会生成范围 [0,100)[100,200) 等,直到 [1000,)

您可以通过为 generate_series() 使用不同的参数并调整计算最后一个范围的表达式来调整范围的宽度和数量

这可用于外连接以聚合每个范围的值:

with ranges as (
  select int4range(x.start, case when x.start = 1000 then null else x.start + 100 end, '[)') as range
  from generate_series(0,1000,100) as x(start)
)  
select r.range as metric,
       sum(t.value)
from ranges r
  left join the_table t on r.range @> t.metric
group by range;

表达式 r.range @> t.metric 测试指标值是否落在(生成的)范围内

Online example

答案 1 :(得分:2)

您可以创建一个具有您喜欢的间隔的伪表并加入该表。 对于这种情况,我将使用递归 CTE。

WITH RECURSIVE cte AS(
   select 0 St, 99 Ed
    UNION ALL
    select St + 100, Ed + 100 from cte where St <= 1000 
)   
select cte.st as metric,sum(tb.value) as value from cte 
inner join [tableName] tb --with OP query result
on tb.metric between cte.St and cte.Ed
group by cte.st
order by st

这里是带有一些伪数据的 DB<>fiddle

答案 2 :(得分:1)

使用条件聚合

SELECT
case when polutionmm2>=0 and polutionmm2<100 then '100' 
when polutionmm2>=100 and polutionmm2<200 then '200' 
........
when polutionmm2>=900 and polutionmm2<1000 then '1000'
end  AS metric,
sum(cnt) as value
FROM polutiondistributionstatistic as p inner join crates as c on p.crateid = c.id
WHERE
c.name = '154' 
and to_timestamp(startts) >= '2021/01/20 00:00:00' 
group by case when polutionmm2>=0 and polutionmm2<100 then '100' 
when polutionmm2>=100 and polutionmm2<200 then '200' 
........
when polutionmm2>=900 and polutionmm2<1000 then '1000'
end