如何根据序列和其他列的组对数据进行分组

时间:2014-11-08 15:48:01

标签: sql oracle gaps-and-islands

我在Oracle中有一个包含3列c1,c2,c3的表,如下所示:

c1  c2  c3
 1  34   2
 2  34   2
 3  34   2
 4  24   2
 5  24   2
 6  34   2
 7  34   2
 8  34   1

我需要对col1进行分组,并根据其序列col1col2获取最小和最大数量(col3)。

即,我需要如下结果:

c1_min  c1_max  c2  c3
     1       3  34   2
     4       5  24   2
     6       7  34   2
     8       8  34   1

2 个答案:

答案 0 :(得分:3)

有多种方法可以接近gaps-and-islands problem。作为Sylvain的lag版本的替代品 - 不是更好,只是不同 - 您可以使用根据您的分组字段分析计算行数的技巧。这增加了一个“链条”。 psuedcolumn到表值,对于每个连续的c2 / c3对组都是唯一的:

select c1, c2, c3,
  dense_rank() over (partition by c2, c3 order by c1)
    - dense_rank() over (partition by null order by c1) as chain
from t42
order by c1, c2, c3;

(我不能相信这一点 - 我第一次看到它here)。然后,您可以将其用作内联视图来计算总和:

select min(c1) as c1_min, max(c1) as c1_max, c2, c3
from (
  select c1, c2, c3,
    dense_rank() over (partition by c2, c3 order by c1)
      - dense_rank() over (partition by null order by c1) as chain
  from t42
)
group by c2, c3, chain
order by c1_min;

    C1_MIN     C1_MAX         C2         C3
---------- ---------- ---------- ----------
         1          3         34          2 
         4          5         24          2 
         6          7         34          2 
         8          8         34          1 

SQL Fiddle也显示了中间阶段。

您可以使用其他分析函数,例如row_number()而不是dense_rank();对于某些数据,它们可能会给出稍微不同的结果,但您会获得same result with this sample

答案 1 :(得分:2)

如果我理解得很清楚,您希望将连续的行组合在一起。这远非微不足道。或者至少,我现在无法找到简单的方式。为了便于理解,我将分几个步骤打破查询:

第1步:

首先要确定你的"群组"边界。使用LAG分析函数可能会对您有所帮助:

CASE WHEN LAG("c2", 1) OVER(ORDER BY "c1") = "c2" 
      AND LAG("c3", 1) OVER(ORDER BY "c1") = "c3" 
     THEN 0 
     ELSE 1
END CLK,
T.* FROM T
ORDER BY "c1"

第2步:

第二步是为每个组编号。一个简单的SUM over分区就可以了。这导致:

SELECT SUM(CLK) OVER (ORDER BY "c1"
                      ROWS BETWEEN UNBOUNDED PRECEDING 
                      AND CURRENT ROW) GRP,
       V.* 
FROM (
  SELECT
    CASE WHEN LAG("c2", 1) OVER(ORDER BY "c1") = "c2" 
          AND LAG("c3", 1) OVER(ORDER BY "c1") = "c3" 
         THEN 0 
         ELSE 1
    END CLK,
    T.* FROM T
) V
ORDER BY "c1";

最后一步:

最后,您可以将其包装在简单的GROUP BY查询中以获得所需的输出:

SELECT MIN("c1"), MAX("c1"), "c2", "c3" FROM
(
    SELECT SUM(CLK) OVER (ORDER BY "c1"
                          ROWS BETWEEN UNBOUNDED PRECEDING 
                          AND CURRENT ROW) GRP,
           V.* 
    FROM (
      SELECT
        CASE WHEN LAG("c2", 1) OVER(ORDER BY "c1") = "c2"
              AND LAG("c3", 1) OVER(ORDER BY "c1") = "c3"
             THEN 0 
             ELSE 1
        END CLK,
        T.* FROM T
    ) V
)
GROUP BY GRP, "c2", "c3"
ORDER BY GRP

请参阅http://sqlfiddle.com/#!4/7d57c/10