将SQL结果拆分为max size = n的组

时间:2016-03-01 11:49:25

标签: sql postgresql plpgsql

我有一张桌子

 id | volume_id| ... |
----+----------+-----+
  1 |       1  | ... |
  2 |       2  | ... |
  3 |       1  | ... |
  4 |       3  | ... |
  5 |       2  | ... |
  ...

我可以做一个简单的分组查询:

select volume_id, count(*), min(id) as min_id, max(id) as max_id
from my_table
group by volume_id;

哪会产生结果:

 volume_id | count | min_id | max_id    
-----------+-------+--------+--------
         1 | 67330 |  ...   | ...
         2 | 67330 |  ...   | ...
         3 | 67330 |  ...   | ...
         4 | 67330 |  ...   | ...

但我想将结果分成40K行的组。所以结果应该是这样的:

 volume_id | count | min_id | max_id    
-----------+-------+--------+--------
         1 | 40000 |  ...   | ...      <- first  group of IDs for volume 1
         1 | 27330 |  ...   | ...      <- second group of IDs for volume 1
         2 | 40000 |  ...   | ...
         2 | 27330 |  ...   | ...
         3 | 40000 |  ...   | ...
         4 | 27330 |  ...   | ...

ID应该被拆分,以便第一组的max_id小于第二组的min_id,依此类推。

如果有人知道如何编写这样的查询(或plsql函数,如果没有其他方法),我将不胜感激。

我正在使用Postgresql 9.5。

1 个答案:

答案 0 :(得分:4)

您可以使用rank()(或row_number(),如果没有重复项)来枚举这些组。然后是group by中的简单算术:

select volume_id, count(*), min(id) as min_id, max(id) as max_id
from (select t.*,
             rank() over (partition by volume_id order by id) as seqnum
      from my_table t
     ) t
group by volume_id, floor((seqnum - 1) / 40000)
order by volume_id, min(id);