我有一系列记录,其中包含一些具有时间有效性的信息(产品类型)。
如果分组信息(产品类型)保持不变,我想将相邻的有效间隔合并在一起。我不能在GROUP BY
和MIN
中使用简单的MAX
,因为某些产品类型(示例中为A
)可以“消失”并“返回”。 / p>
使用Oracle 11g。
MySQL的类似问题是:How can I do a contiguous group by in MySQL?
| PRODUCT | START_DATE | END_DATE |
|---------|----------------------------------|----------------------------------|
| A | July, 01 2013 00:00:00+0000 | July, 31 2013 00:00:00+0000 |
| A | August, 01 2013 00:00:00+0000 | August, 31 2013 00:00:00+0000 |
| A | September, 01 2013 00:00:00+0000 | September, 30 2013 00:00:00+0000 |
| B | October, 01 2013 00:00:00+0000 | October, 31 2013 00:00:00+0000 |
| B | November, 01 2013 00:00:00+0000 | November, 30 2013 00:00:00+0000 |
| A | December, 01 2013 00:00:00+0000 | December, 31 2013 00:00:00+0000 |
| A | January, 01 2014 00:00:00+0000 | January, 31 2014 00:00:00+0000 |
| A | February, 01 2014 00:00:00+0000 | February, 28 2014 00:00:00+0000 |
| A | March, 01 2014 00:00:00+0000 | March, 31 2014 00:00:00+0000 |
| PRODUCT | START_DATE | END_DATE |
|---------|---------------------------------|----------------------------------|
| A | July, 01 2013 00:00:00+0000 | September, 30 2013 00:00:00+0000 |
| B | October, 01 2013 00:00:00+0000 | November, 30 2013 00:00:00+0000 |
| A | December, 01 2013 00:00:00+0000 | March, 31 2014 00:00:00+0000 |
查看完整的SQL Fiddle。
答案 0 :(得分:6)
这是一个缺口和岛屿问题。有各种方法来处理它;这使用lead
和lag
分析函数:
select distinct product,
case when start_date is null then lag(start_date)
over (partition by product order by rn) else start_date end as start_date,
case when end_date is null then lead(end_date)
over (partition by product order by rn) else end_date end as end_date
from (
select product, start_date, end_date, rn
from (
select t.product,
case when lag(end_date)
over (partition by product order by start_date) is null
or lag(end_date)
over (partition by product order by start_date) != start_date - 1
then start_date end as start_date,
case when lead(start_date)
over (partition by product order by start_date) is null
or lead(start_date)
over (partition by product order by start_date) != end_date + 1
then end_date end as end_date,
row_number() over (partition by product order by start_date) as rn
from t
)
where start_date is not null or end_date is not null
)
order by start_date, product;
PRODUCT START_DATE END_DATE
------- ---------- ---------
A 01-JUL-13 30-SEP-13
B 01-OCT-13 30-NOV-13
A 01-DEC-13 31-MAR-14
最里面的查询查看产品的前后记录,如果记录不连续,则仅保留开始和/或结束时间:
select t.product,
case when lag(end_date)
over (partition by product order by start_date) is null
or lag(end_date)
over (partition by product order by start_date) != start_date - 1
then start_date end as start_date,
case when lead(start_date)
over (partition by product order by start_date) is null
or lead(start_date)
over (partition by product order by start_date) != end_date + 1
then end_date end as end_date
from t;
PRODUCT START_DATE END_DATE
------- ---------- ---------
A 01-JUL-13
A
A 30-SEP-13
A 01-DEC-13
A
A
A 31-MAR-14
B 01-OCT-13
B 30-NOV-13
下一级选择会删除那些处于中期的内容,其中两个日期都被内部查询消隐,这给出了:
PRODUCT START_DATE END_DATE
------- ---------- ---------
A 01-JUL-13
A 30-SEP-13
A 01-DEC-13
A 31-MAR-14
B 01-OCT-13
B 30-NOV-13
外部查询然后折叠那些相邻的对;我已经使用了创建重复项的简单路径,然后使用distinct
来消除它们,但您可以通过其他方式执行此操作,例如将两个值放入其中一对行中,并将两个值保留在另一个行中,并且然后用另一层选择消除那些,但我认为这里的区别是可以的。
如果您的真实用例有时间而不仅仅是日期,那么您需要在内部查询中调整比较;而不是+/- 1,可能是1秒的间隔,或者如果您愿意,则为1/86400,但取决于您的值的精确度。
答案 1 :(得分:2)
似乎应该有一种更简单的方法,但是分析查询(找到不同的间隙)和分层查询(连接连续的行)的组合起作用:
with data as (
select 'A' product, to_date('7/1/2013', 'MM/DD/YYYY') start_date, to_date('7/31/2013', 'MM/DD/YYYY') end_date from dual union all
select 'A' product, to_date('8/1/2013', 'MM/DD/YYYY') start_date, to_date('8/31/2013', 'MM/DD/YYYY') end_date from dual union all
select 'A' product, to_date('9/1/2013', 'MM/DD/YYYY') start_date, to_date('9/30/2013', 'MM/DD/YYYY') end_date from dual union all
select 'B' product, to_date('10/1/2013', 'MM/DD/YYYY') start_date, to_date('10/31/2013', 'MM/DD/YYYY') end_date from dual union all
select 'B' product, to_date('11/1/2013', 'MM/DD/YYYY') start_date, to_date('11/30/2013', 'MM/DD/YYYY') end_date from dual union all
select 'A' product, to_date('12/1/2013', 'MM/DD/YYYY') start_date, to_date('12/31/2013', 'MM/DD/YYYY') end_date from dual union all
select 'A' product, to_date('1/1/2014', 'MM/DD/YYYY') start_date, to_date('1/31/2014', 'MM/DD/YYYY') end_date from dual union all
select 'A' product, to_date('2/1/2014', 'MM/DD/YYYY') start_date, to_date('2/28/2014', 'MM/DD/YYYY') end_date from dual union all
select 'A' product, to_date('3/1/2014', 'MM/DD/YYYY') start_date, to_date('3/31/2014', 'MM/DD/YYYY') end_date from dual
),
start_points as
(
select product, start_date, end_date, prior_end+1, case when prior_end + 1 = start_date then null else 'Y' end start_point
from (
select product, start_date, end_date, lag(end_date,1) over (partition by product order by end_date) prior_end
from data
)
)
select product, min(start_date) start_date, max(end_date) end_date
from (
select product, start_date, end_date, level, connect_by_root(start_date) root_start
from start_points
start with start_point = 'Y'
connect by prior end_date = start_date - 1
and prior product = product
)
group by product, root_start;
PRODUCT START_DATE END_DATE
------- ---------- ---------
A 01-JUL-13 30-SEP-13
A 01-DEC-13 31-MAR-14
B 01-OCT-13 30-NOV-13
答案 2 :(得分:0)
这是一个非常复杂的步骤,但这是我解决类似问题的方式:
-- Sample Data
CREATE TABLE AdjacentValidity
(
RowID INT IDENTITY(1,1) NOT NULL,
Product VARCHAR(1) NOT NULL,
Start_Date DATETIME NOT NULL,
End_Date DATETIME NOT NULL
)
INSERT INTO AdjacentValidity (Product, Start_Date, End_Date)
SELECT 'A', '7/1/2013', '7/31/2013' UNION
SELECT 'A', '8/1/2013', '8/31/2013' UNION
SELECT 'A', '9/1/2013', '9/30/2013' UNION
SELECT 'B', '10/1/2013', '10/31/2013' UNION
SELECT 'B', '11/1/2013', '11/30/2013' UNION
SELECT 'A', '12/1/2013', '12/31/2013' UNION
SELECT 'A', '1/1/2014', '1/31/2014' UNION
SELECT 'A', '2/1/2014', '2/28/2014' UNION
SELECT 'A', '3/1/2014', '3/31/2014'
-- Modify the sample data to include necessary tags
CREATE TABLE #RawData
(
RawData_ID INT IDENTITY(1,1) NOT NULL,
Product VARCHAR(1) NOT NULL,
Start_Date DATETIME NOT NULL,
End_Date DATETIME NOT NULL,
isFirstOccurrence BIT NULL,
isLastOccurrence BIT NULL,
isFirstInstance BIT NULL,
isLastInstance BIT NULL
)
-- Load and flag first occurrences of a natural key
INSERT INTO #RawData
(
Product,
Start_Date,
End_Date,
isFirstInstance
)
SELECT
Product,
Start_Date,
End_Date,
CASE WHEN ROW_NUMBER() OVER
(
--PARTITION BY <NaturalKey>
ORDER BY Start_date
) = 1 THEN 1 ELSE 0 END AS isFirstOccurrence
FROM AdjacentValidity
-- update to flag the last sequential instance of a particalar data set, and the last occurrence of a natural key
UPDATE a
SET
a.isLastInstance =
CASE
WHEN
a.Product <> b.Product OR
DATEADD(m, 1, a.Start_Date) <> b.Start_Date OR
b.RawData_ID IS NULL
THEN 1
ELSE 0
END,
a.isLastOccurrence =
CASE
WHEN
b.RawData_ID IS NULL
THEN 1
ELSE 0
END
FROM
#RawData a
LEFT JOIN
#RawData b ON
b.RawData_ID = a.RawData_ID + 1 --AND
--b.<NaturalKey> = a.<NaturalKey>
-- flag first sequential instance of a particular data set
UPDATE b
SET
b.isFirstInstance =
CASE
WHEN
a.isLastInstance = 1
THEN 1
ELSE 0
END
FROM
#RawData a
LEFT JOIN
#RawData b ON
b.RawData_ID = a.RawData_ID + 1 --AND
--b.<NaturalKey> = a.<NaturalKey>
-- reduce the records to only those that are the first or last occurrence of a particular data set
CREATE TABLE #UniqueData
(
[UniqueData_ID] [int] IDENTITY(1,1) NOT NULL,
Start_Date DATETIME NOT NULL,
End_Date DATETIME NOT NULL,
Product VARCHAR(1) NULL,
isFirstOccurrence BIT NULL,
isLastOccurrence BIT NULL,
isFirstInstance BIT NULL,
isLastInstance BIT NULL
)
INSERT INTO #UniqueData
(
Start_Date,
End_Date,
Product,
isFirstOccurrence,
isLastOccurrence,
isFirstInstance,
isLastInstance
)
SELECT
Start_Date,
End_Date,
Product,
isFirstOccurrence,
isLastOccurrence,
isFirstInstance,
isLastInstance
FROM
#RawData
WHERE
isFirstOccurrence = 1 OR
isFirstInstance = 1 OR
isLastInstance = 1
ORDER BY RawData_ID, Start_Date
-- combine the first and last occurrences in any given sequence into a single row
SELECT
a.Start_Date,
ISNULL(b.Start_Date, a.End_Date) End_Date,
a.Product
FROM
#UniqueData a
LEFT JOIN
#UniqueData b ON
b.UniqueData_ID = a.UniqueData_ID + 1 AND
--b.<NaturalKey> = a.<NaturalKey> AND
a.isLastInstance <> 1
WHERE a.isFirstInstance = 1 or a.isFirstOccurrence = 1
ORDER BY a.UniqueData_ID
-- clean up
/*
DROP TABLE AdjacentValidity
DROP TABLE #RawData
DROP TABLE #UniqueData
*/
答案 3 :(得分:-1)
尝试类似:
with dat as (
select 'A' as product, sysdate-3 as start_dte, sysdate-2 as end_dte from dual
union all
select 'A' as product, sysdate-2 as start_dte, sysdate-1 as end_dte from dual
union all
select 'B' as product, sysdate-5 as start_dte, sysdate-4 as end_dte from dual
)
SELECT product,
MIN(start_dte) KEEP (DENSE_RANK FIRST ORDER BY start_dte) "Start",
MAX(end_dte) KEEP (DENSE_RANK LAST ORDER BY end_dte) "End"
FROM dat
GROUP BY product
ORDER BY product;
输出
PRODUCT Start End
A 2/24/2014 10:25:53 AM 2/26/2014 10:25:53 AM
B 2/22/2014 10:25:53 AM 2/23/2014 10:25:53 AM