Question

我一直在试图在PostgreSQL中解决这个问题。我有一个包含2列的表test： - id和content。 e.g。

create table test (id integer, 
                   content varchar(1024));

insert into test (id, content) values 
    (1, 'Lorem Ipsum is simply dummy text of the printing and typesetting industry.'),
    (2, 'Lorem Ipsum has been the industrys standard dummy text '),
    (3, 'ever since the 1500s, when an unknown printer took a galley of type and scrambled it to'),
    (4, 'make a type specimen book.'),
    (5, 'It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.'),
    (6, 'It was popularised in the 1960s with the release of Letraset sheets containing Lorem '),
    (7, 'Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker'),
    (8, ' including versions of Lorem Ipsum.');

如果我运行以下查询...

select id, length(content) as characters from test order by id

...然后我得到： -

id | characters
---+-----------
 1 |         74
 2 |         55
 3 |         87
 4 |         26
 5 |        120
 6 |         85
 7 |         87
 8 |         35

我想要做的是将id分组到内容总和超过阈值的行中。例如，如果该阈值为100，则所需结果如下所示： -

ids | characters
----+-----------   
1,2 |        129
3,4 |        113    
5   |        120
6,7 |        172    
8   |         35

注意（1）： - 查询不需要生成characters列 - 只需ids - 他们在这里传达他们的信息全部都在100 - 除了最后一行35。

注意（2）： - ids可以是逗号分隔的字符串或PostgreSQL数组 - 类型不如值

我可以使用窗口函数来执行此操作，还是需要更复杂的内容，例如lateral join？

Answer 1

此类问题需要递归CTE（或类似功能）。这是一个例子：

with recursive t as (
      select id, length(content) as len,
             row_number() over (order by id) as seqnum
      from test 
     ),
     cte(id, len, ids, seqnum, grp) as (
      select id, len, len as cumelen, t.id::text, 1::int as seqnum, 1 as grp
      from t
      where seqnum = 1
      union all
      select t.id,
             t.len,
             (case when cte.cumelen >= 100 then t.len else cte.cumelen + t.len end) as cumelen,
             (case when cte.cumelen >= 100 then t.id::text else cte.ids || ',' || t.id::text end) as ids,
             t.seqnum
             (case when cte.cumelen >= 100 then cte.grp + 1 else cte.grp end) as ids,
      from t join
           cte
           on cte.seqnum = t.seqnum - 1
     )
select grp, max(ids)
from cte
group by grp;

这是一个小工作示例：

with recursive test as (
      select 1 as id, 'abcd'::text as content union all
      select 2 as id, 'abcd'::text as content union all
      select 3 as id, 'abcd'::text as content 
     ),
     t as (
      select id, length(content) as len,
             row_number() over (order by id) as seqnum
      from test 
     ),
     cte(id, len, cumelen, ids, seqnum, grp) as (
      select id, len, len as cumelen, t.id::text, 1::int as seqnum, 1 as grp
      from t
      where seqnum = 1
      union all
      select t.id,
             t.len,
             (case when cte.cumelen >= 5 then t.len else cte.cumelen + t.len end) as cumelen,
             (case when cte.cumelen >= 5 then t.id::text else cte.ids || ',' || t.id::text end) as ids,
             t.seqnum::int,
             (case when cte.cumelen >= 5 then cte.grp + 1 else cte.grp end)
      from t join
           cte
           on cte.seqnum = t.seqnum - 1
     )
select grp, max(ids)
from cte
group by grp;

Answer 2

使用存储的函数可以避免（有时）突破性的查询。

create or replace function fn_foo(ids out int[], characters out int) returns setof record language plpgsql as $$
declare
  r record;
  threshold int := 100;
begin
  ids := '{}'; characters := 0;
  for r in (
    select id, coalesce(length(content),0) as lng
    from test order by id)
  loop
    characters := characters + r.lng;
    ids := ids || r.id;
    if characters > threshold then
      return next;
      ids := '{}'; characters := 0;
    end if;
  end loop;
  if ids <> '{}' then
    return next;
  end if;
end $$;

select * from fn_foo();

╔═══════╤════════════╗
║  ids  │ characters ║
╠═══════╪════════════╣
║ {1,2} │        129 ║
║ {3,4} │        113 ║
║ {5}   │        120 ║
║ {6,7} │        172 ║
║ {8}   │         35 ║
╚═══════╧════════════╝
(5 rows)

Answer 3

这里我有一个使用LEAD（）窗口函数

的查询

SELECT id || ',' || next_id, characters + next_characters total_characters 
FROM  (SELECT id, characters, row_num, 
              CASE 
                WHEN row_num % 2 = 0 
                     AND characters < 100 THEN Lead(id) OVER(ORDER BY id) 
                ELSE NULL 
              END next_id, 
              CASE 
                WHEN row_num % 2 = 0 
                     AND characters < 100 THEN NULL 
                ELSE Lead(characters) OVER(ORDER BY id) 
              END AS next_characters 
       FROM  (SELECT id, 
                     Length(content)  AS characters, 
                     Row_number() 
                       OVER( 
                         ORDER BY id) row_num 
              FROM   test 
              ORDER  BY id)) 
WHERE  next_id IS NULL;

希望这可以帮助你。

PostgreSQL Group By Sum

3 个答案: