最小值和最大值按连续范围分组

时间:2019-04-12 20:00:03

标签: sql postgresql gaps-and-islands

我有一张表,该表通知我发生错误的错误类型和行号。 (此过程目前不相关)。我需要按错误类型分组,并显示每种错误类型的行首和行尾,这是每种错误类型的范围所导致的。我需要考虑线间距

我的表和查询是:

create table errors (
    err_type varchar(10),
    line integer);

insert into errors values
('type_A', 1),('type_A', 2),('type_A', 3),
('type_A', 6),('type_A', 7),
('type_B', 9),('type_B', 10),
('type_B', 12),('type_B', 13),('type_B', 14),('type_B', 15),
('type_C', 21);

select * from errors;

我的数据:

err_type    line
----------------
type_A      1
type_A      2
type_A      3
type_A      6
type_A      7
type_B      9
type_B     10
type_B     12
type_B     13
type_B     14
type_B     15
type_C     21

我需要执行以下查询:

err_type    line_start   line_end
-------------------------------
type_A      1             3
type_A      6             7
type_B      9            10
type_B     12            15
type_C     21            21

我使用的是PostgreSQL,但是Oracle对于partitioning over功能具有类似的语法。

有什么建议吗?

3 个答案:

答案 0 :(得分:1)

您可以建立如下查询:

with base as (
    select errors.*, 
           sign(line - 1 - lag(line, 1, 1) over (
                 partition by err_type 
                 order by line)) as is_start
    from   errors
), parts as (
    select base.*, 
           sum(is_start) over (
                 partition by err_type 
                 order by line) as part
    from   base
)
select   err_type, 
         min(line),
         max(line) 
from     parts
group by err_type, part
order by err_type, part;

答案 1 :(得分:0)

如果您不想使用window / agg函数。

WITH
  table_min AS
  (
    SELECT
      a.err_type, a.line
    FROM errors a
    LEFT JOIN errors b ON a.err_type = b.err_type AND a.line  = b.line +1
    WHERE b.err_type IS NULL
  ),
  table_max AS
  (
    SELECT
      a.err_type, a.line
    FROM errors a
    LEFT JOIN errors b ON a.err_type = b.err_type AND a.line + 1 = b.line
    WHERE b.err_type IS NULL
  ),
  table_next AS
  (
    SELECT
      mx.err_type, mx.line, mi.line AS next_line_start
    FROM table_min mi
    INNER JOIN table_max mx
      ON mi.err_type = mx.err_type
      AND mi.line > mx.line
  )
SELECT
  a.err_type, a.line AS line_start, b.line AS line_end
FROM table_min a
INNER JOIN table_max b ON a.err_type = b.err_type AND a.line <= b.line
LEFT JOIN table_next n ON a.err_type = n.err_type
WHERE
  (b.line = n.line OR n.next_line_start = a.line OR n.line IS NULL)
ORDER BY a.line

答案 2 :(得分:0)

这是一个孤岛问题。我认为最简单的方法是row_number()group by

select err_type, min(line), max(line)
from (select e.*, row_number() over (partition by err_type order by line) as seqnum
      from errors e
     ) e
group by err_type, (line - seqnum)
order by err_type, min(line);

Here是db <>小提琴。