如果我只关心每组的单个最小值和最大值,这将是相对容易的,问题是我的要求是找到各种边界。示例数据集如下:
BoundaryColumn GroupIdentifier 1 A 3 A 4 A 7 A 8 B 9 B 11 B 13 A 14 A 15 A 16 A
我需要的是sql的结果集如下:
min max groupid
1 7 A
8 11 B
13 16 A
基本上找到每个群组的边界。
数据将存储在oracle11g或mysql中,因此可以为任一平台提供语法。
答案 0 :(得分:2)
免责声明:查询部分结果并使用前端语言处理此类内容会轻松得多。那说......
以下查询适用于Oracle(支持分析查询)但不适用于MySQL(不支持)。有一个SQL小提琴here。
WITH BoundX AS (
SELECT * FROM (
SELECT
BoundaryColumn,
GroupIdentifier,
LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
FROM MyTable
ORDER BY BoundaryColumn
)
WHERE GIDLag IS NULL OR GroupIdentifier <> GIDLag
OR GIDLead IS NULL OR GroupIdentifier <> GIDLead
)
SELECT MIN, MAX, GROUPID
FROM (
SELECT
BoundaryColumn AS MIN,
LEAD(BoundaryColumn) OVER (ORDER BY BoundaryColumn) AS MAX,
GroupIdentifier AS GROUPID,
GIDLag,
GIDLead
FROM BoundX
)
WHERE GROUPID = GIDLead
这是逻辑,一步一步。你可能会对此有所改进,因为我觉得这里有一个子查询太多......
此查询将先前和后续的GroupIdentifier
值拉入每一行:
SELECT
BoundaryColumn,
GroupIdentifier,
LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
FROM MyTable
ORDER BY BoundaryColumn
结果如下:
BoundaryColumn GroupIdentifier GIDLag GIDLead
1 A A
3 A A A
4 A A A
7 A A B
8 B A B
9 B B B
11 B B A
13 A B A
14 A A A
15 A A A
16 A A
如果您添加逻辑以摆脱GIDLag
= GIDLead
= GroupIdentifier
的所有行,您将最终获得边界:
WITH BoundX AS (
SELECT * FROM (
SELECT
BoundaryColumn,
GroupIdentifier,
LAG(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLag,
LEAD(GroupIdentifier) OVER (ORDER BY BoundaryColumn) AS GIDLead
FROM MyTable
ORDER BY BoundaryColumn
)
WHERE GIDLag IS NULL OR GroupIdentifier <> GIDLag
OR GIDLead IS NULL OR GroupIdentifier <> GIDLead
)
SELECT
BoundaryColumn AS MIN,
LEAD(BoundaryColumn) OVER (ORDER BY BoundaryColumn) AS MAX,
GroupIdentifier AS GROUPID,
GIDLag,
GIDLead
FROM BoundX
通过这个添加,结果是:
MIN MAX GROUPID GIDLAG GIDLEAD
--- --- ------- ------ -------
1 7 A A
7 8 A A B
8 11 B A B
11 13 B B A
13 16 A B A
16 A A
最后,仅包含GroupID = GIDLead
的行。这是这个答案顶部的查询。结果是:
MIN MAX GROUPID
--- --- -------
1 7 A
8 11 B
13 16 A
答案 1 :(得分:1)
另一种方法(Oracle)。这里我们简单地将针对表t1
(您的表)发出的查询返回的结果集划分为逻辑组(grp
)。当值GroupIdentifier
发生变化时,每个新组都会启动:
select min(q.BoundaryColumn) as MinB
, max(q.BoundaryColumn) as MaxB
, max(q.GroupIdentifier) as groupid
from ( select s.BoundaryColumn
, s.GroupIdentifier
, sum(grp) over(order by s.BoundaryColumn) as grp
from ( select BoundaryColumn
, GroupIdentifier
, case
when GroupIdentifier <> lag(GroupIdentifier)
over(order by BoundaryColumn)
then 1
end as grp
from t1) s
) q
group by q.grp
结果:
MINB MAXB GROUPID
---------- ---------- -------
1 7 A
8 11 B
13 16 A
答案 2 :(得分:0)
请查看此网站有关“运行”的数据:http://www.sqlteam.com/article/detecting-runs-or-streaks-in-your-data
根据该链接提供的知识,您可以编写如下查询:
SELECT BoundaryColumn,
GroupIdentifier,
(
SELECT COUNT(*)
FROM Table T
WHERE T.GroupIdentifier <> TR.GroupIdentifier
AND T.BoundaryColumn <= TR.BoundaryColumn
) as RunGroup
FROM Table TR
使用此信息,您可以按“RunGroup”进行分组,然后选择GroupIdentifier和min / max BoundaryColumn。
编辑:我感受到同伴的压力,这是一个SQLFiddle和我的答案版本:http://www.sqlfiddle.com/#!8/9a24c/4/0