分组以找到每组的最小值,最大值

时间:2013-10-16 17:38:52

标签: mysql sql oracle11g

如果我只关心每组的单个最小值和最大值,这将是相对容易的,问题是我的要求是找到各种边界。示例数据集如下:

BoundaryColumn  GroupIdentifier
1                  A
3                  A
4                  A
7                  A
8                  B
9                  B  
11                 B  
13                 A
14                 A
15                 A
16                 A

我需要的是sql的结果集如下:

min  max  groupid
1    7    A
8    11   B
13   16   A

基本上找到每个群组的边界。

数据将存储在oracle11g或mysql中,因此可以为任一平台提供语法。

3 个答案:

答案 0 :(得分:2)

免责声明:查询部分结果并使用前端语言处理此类内容会轻松得多。那说......

以下查询适用于Oracle(支持分析查询)但不适用于MySQL(不支持)。有一个SQL小提琴here

WITH BoundX AS (
  SELECT * FROM (
    SELECT
     BoundaryColumn,
     GroupIdentifier,
     LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
     LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
    FROM MyTable
    ORDER BY BoundaryColumn
  )
  WHERE GIDLag IS NULL OR GroupIdentifier <> GIDLag
     OR GIDLead IS NULL OR GroupIdentifier <> GIDLead
)
SELECT MIN, MAX, GROUPID
FROM (
  SELECT
    BoundaryColumn AS MIN,
    LEAD(BoundaryColumn) OVER (ORDER BY BoundaryColumn) AS MAX,
    GroupIdentifier AS GROUPID,
    GIDLag,
    GIDLead
  FROM BoundX
)
WHERE GROUPID = GIDLead

这是逻辑,一步一步。你可能会对此有所改进,因为我觉得这里有一个子查询太多......

此查询将先前和后续的GroupIdentifier值拉入每一行:

SELECT
 BoundaryColumn,
 GroupIdentifier,
 LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
 LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
FROM MyTable
ORDER BY BoundaryColumn

结果如下:

BoundaryColumn  GroupIdentifier  GIDLag  GIDLead
1                  A                         A
3                  A                A        A
4                  A                A        A
7                  A                A        B
8                  B                A        B
9                  B                B        B
11                 B                B        A
13                 A                B        A
14                 A                A        A
15                 A                A        A
16                 A                A

如果您添加逻辑以摆脱GIDLag = GIDLead = GroupIdentifier的所有行,您将最终获得边界:

WITH BoundX AS (
  SELECT * FROM (
    SELECT
     BoundaryColumn,
     GroupIdentifier,
     LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
     LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
    FROM MyTable
    ORDER BY BoundaryColumn
  )
  WHERE GIDLag IS NULL OR GroupIdentifier <> GIDLag
     OR GIDLead IS NULL OR GroupIdentifier <> GIDLead
)
SELECT
  BoundaryColumn AS MIN,
  LEAD(BoundaryColumn) OVER (ORDER BY BoundaryColumn) AS MAX,
  GroupIdentifier AS GROUPID,
  GIDLag,
  GIDLead
FROM BoundX

通过这个添加,结果是:

MIN MAX GROUPID GIDLAG GIDLEAD
--- --- ------- ------ -------
  1   7 A              A
  7   8 A       A      B
  8  11 B       A      B
 11  13 B       B      A
 13  16 A       B      A
 16     A       A

最后,仅包含GroupID = GIDLead的行。这是这个答案顶部的查询。结果是:

MIN MAX GROUPID
--- --- -------
  1   7 A
  8  11 B
 13  16 A

答案 1 :(得分:1)

另一种方法(Oracle)。这里我们简单地将针对表t1(您的表)发出的查询返回的结果集划分为逻辑组(grp)。当值GroupIdentifier发生变化时,每个新组都会启动:

select min(q.BoundaryColumn)  as MinB
     , max(q.BoundaryColumn)  as MaxB
     , max(q.GroupIdentifier) as groupid
  from ( select s.BoundaryColumn
              , s.GroupIdentifier
              , sum(grp) over(order by s.BoundaryColumn) as grp
           from ( select BoundaryColumn
                       , GroupIdentifier
                       , case 
                           when GroupIdentifier <> lag(GroupIdentifier) 
                                                   over(order by BoundaryColumn) 
                           then 1
                         end as grp
                    from t1) s
       ) q
 group by q.grp

结果:

      MINB       MAXB  GROUPID
---------- ----------  -------
         1          7  A       
         8         11  B       
        13         16  A  

SQLfiddle Demo

答案 2 :(得分:0)

请查看此网站有关“运行”的数据:http://www.sqlteam.com/article/detecting-runs-or-streaks-in-your-data

根据该链接提供的知识,您可以编写如下查询:

SELECT BoundaryColumn,
GroupIdentifier,
(
SELECT COUNT(*)
FROM Table T
WHERE T.GroupIdentifier <> TR.GroupIdentifier
AND T.BoundaryColumn <= TR.BoundaryColumn
) as RunGroup
FROM Table TR

使用此信息,您可以按“RunGroup”进行分组,然后选择GroupIdentifier和min / max BoundaryColumn。

编辑:我感受到同伴的压力,这是一个SQLFiddle和我的答案版本:http://www.sqlfiddle.com/#!8/9a24c/4/0