我似乎无法弄清楚这个查询。我需要将时间连续状态行组合成一个状态。
此问题类似于此处发现的问题,但我使用的是Oracle 10而不是SQL Server:Combine rows when the end time of one is the start time of another
示例数据:
name start_inst end_inst code subcode
Person1 9/12/2011 10:55 9/12/2011 11:49 161 50
Person1 9/12/2011 11:49 9/12/2011 11:55 107 28
Person1 9/12/2011 11:55 9/12/2011 12:07 161 50
Person1 9/12/2011 12:07 9/12/2011 12:26 161 50
Person1 9/12/2011 12:26 9/12/2011 12:57 161 71
Person1 9/12/2011 12:57 9/12/2011 13:07 161 71
Person1 9/12/2011 13:07 9/12/2011 13:20 52 50
我想得到以下输出:
name start_inst end_inst code subcode
Person1 9/12/2011 10:55 9/12/2011 11:49 161 50
Person1 9/12/2011 11:49 9/12/2011 11:55 107 28
Person1 9/12/2011 11:55 9/12/2011 12:26 161 50
Person1 9/12/2011 12:26 9/12/2011 13:07 161 71
Person1 9/12/2011 13:07 9/12/2011 13:20 52 50
以下是示例SQL:
CREATE TABLE Data (
name varchar2(132 BYTE) not null,
start_inst DATE not null,
end_inst DATE not null,
code number(3) not null,
subcode number(3) not null
);
INSERT INTO Data(name,start_inst,end_inst, code, code2) VALUES('Person1','9/12/2011 10:55','9/12/2011 11:49',161, 50);
INSERT INTO Data(name,start_inst,end_inst, code, code2) VALUES('Person1','9/12/2011 11:49','9/12/2011 11:55',107,28);
INSERT INTO Data(name,start_inst,end_inst, code, code2) VALUES('Person1','9/12/2011 11:55','9/12/2011 12:07',161,50);
INSERT INTO Data(name,start_inst,end_inst, code, code2) VALUES('Person1','9/12/2011 12:07','9/12/2011 12:26',161,50);
INSERT INTO Data(name,start_inst,end_inst, code, code2) VALUES('Person1','9/12/2011 12:26','9/12/2011 12:57',161,71);
INSERT INTO Data(name,start_inst,end_inst, code, code2) VALUES('Person1','9/12/2011 12:57','9/12/2011 13:07',161,71);
INSERT INTO Data(name,start_inst,end_inst, code, code2) VALUES('Person1','9/12/2011 13:07','9/12/2011 13:20',52,50);
提前致谢!
答案 0 :(得分:3)
也许这个? (我没有运行它的SQL机器)
WITH
sequenced_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY name ORDER BY start_inst) NameSequenceID,
ROW_NUMBER() OVER (PARTITION BY name, code, subcode ORDER BY start_inst) NameStateSequenceID,
*
FROM
data
)
SELECT
name,
MIN(start_inst) start_inst,
MAX(end_inst) end_inst,
code,
subcode
FROM
sequenced_data
GROUP BY
name,
code,
subcode,
NameSequenceID - NameStateSequenceID
答案 1 :(得分:3)
这是一个使用递归查询而不是分析函数的解决方案(由@wildplasser建议):
SELECT name, code, subcode, MIN(start_inst) AS start_inst, MAX(end_inst) AS end_inst
FROM (SELECT name,
start_inst,
end_inst,
code,
subcode,
MIN(CONNECT_BY_ROOT (start_inst)) AS root_start
FROM data d
CONNECT BY PRIOR name = name
AND PRIOR end_inst = start_inst
AND PRIOR code = code
AND PRIOR subcode = subcode
GROUP BY name, start_inst, end_inst, code, subcode)
GROUP BY name, code, subcode, root_start;
最里面的查询中的connect by
子句导致数据以分层方式返回。 connect_by_root
为我们提供了每个分支根的值。因为我们没有start with
子句的良好候选者,所以我们将获得所有子行(其中end_inst
等于另一行的start_inst
并且所有其他列都相同)多次:一次作为根,一次(或更多)作为分支。获取根的min
会消除这些额外的行,同时在外部查询中为我们提供分组值。
在外部查询中,我们执行另一个group by
来合并行。不同之处在于,在这种情况下,我们还有root_start
来识别哪些行是连续的,因此需要合并。
答案 2 :(得分:2)
这是另一种方法:
SELECT
name,
min(start_inst) AS start_inst,
max(end_inst) AS end_inst,
code,
subcode
FROM
(
SELECT
A.*,
COUNT
(
CASE WHEN start_inst = previous_end_inst THEN NULL
ELSE 1
END
)
OVER
(
ORDER BY
start_inst,
name,
code,
subcode
) AS group_number
FROM
(
SELECT
name,
start_inst,
end_inst,
LAG
(
end_inst
)
OVER
(
PARTITION BY
name,
code,
subcode
ORDER BY
start_inst
) AS previous_end_inst,
code,
subcode
FROM
data
) A
) B
GROUP BY
name,
code,
subcode,
group_number
ORDER BY
group_number
基本上:
对于每一行,子查询A查找给定名称,代码和子代码的上一个结束时间。
对于每一行,子查询B计算“组编号” - 前一行的运行计数(按start_inst,名称,代码和子代码的顺序),其中在步骤1中计算的上一个结束时间不是等于开始时间。
外部查询按组编号汇总。
无论好坏,这种方法与@ stevo不同,如果一条记录的结束时间与下一条记录的开始时间之间存在“差距”,则会创建一个新的“组”。例如,如果您要在12:57和13:00之间创建一个像这样的间隙......
UPDATE data
SET start_inst = TO_DATE('9/12/2011 13:00', 'MM/DD/YYYY HH24:MI')
WHERE start_inst = TO_DATE('9/12/2011 12:57', 'MM/DD/YYYY HH24:MI');
...上面的查询会返回两行......
NAME START_INST END_INST CODE SUBCODE
-------------------- ---------------- ---------------- ---------- ----------
.
.
.
Person1 09/12/2011 12:26 09/12/2011 12:57 161 71
Person1 09/12/2011 13:00 09/12/2011 13:07 161 71
.
.
.
......而@ stevo的查询将返回这样的一行......
NAME START_INST END_INST CODE SUBCODE
-------------------- ---------------- ---------------- ---------- ----------
.
.
.
Person1 12/09/2011 12:26 12/09/2011 13:07 161 71
.
.
.
希望这有帮助。
答案 3 :(得分:1)
调整desm的查询,我认为这应该有效
WITH
sequenced_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY name ORDER BY start_inst) NameSequenceID,
ROW_NUMBER() OVER (PARTITION BY name, code, subcode ORDER BY start_inst) NameStateSequenceID,
d.*
FROM
data d
)
SELECT
name,
to_char(MIN(start_inst),'DD/MM/YYYY HH24:MI') start_inst,
to_char(MAX(end_inst),'DD/MM/YYYY HH24:MI') end_inst,
code,
subcode
FROM
sequenced_data
GROUP BY
name,
code,
subcode,
NameSequenceID - NameStateSequenceID
ORDER BY name,start_inst
答案 4 :(得分:0)
你可以用递归查询(在oracle,IIRC中使用CONNECT BY / PRIOR的东西)这样做我在这个帖子中为Postgres做了同样的事情:Get total time interval from multiple rows if sequence not broken
可能需要进行一些修改以使其适合oracle语法。