将连续的日期有效期间隔在一起

时间:2014-02-27 15:05:04

标签: sql oracle oracle11g gaps-and-islands

我有一系列记录,其中包含一些具有时间有效性的信息(产品类型)。

如果分组信息(产品类型)保持不变,我想将相邻的有效间隔合并在一起。我不能在GROUP BYMIN中使用简单的MAX,因为某些产品类型(示例中为A)可以“消失”并“返回”。 / p>

使用Oracle 11g。

MySQL的类似问题是:How can I do a contiguous group by in MySQL?

Input data

| PRODUCT |                       START_DATE |                         END_DATE |
|---------|----------------------------------|----------------------------------|
|       A |      July, 01 2013 00:00:00+0000 |      July, 31 2013 00:00:00+0000 |
|       A |    August, 01 2013 00:00:00+0000 |    August, 31 2013 00:00:00+0000 |
|       A | September, 01 2013 00:00:00+0000 | September, 30 2013 00:00:00+0000 |
|       B |   October, 01 2013 00:00:00+0000 |   October, 31 2013 00:00:00+0000 |
|       B |  November, 01 2013 00:00:00+0000 |  November, 30 2013 00:00:00+0000 |
|       A |  December, 01 2013 00:00:00+0000 |  December, 31 2013 00:00:00+0000 |
|       A |   January, 01 2014 00:00:00+0000 |   January, 31 2014 00:00:00+0000 |
|       A |  February, 01 2014 00:00:00+0000 |  February, 28 2014 00:00:00+0000 |
|       A |     March, 01 2014 00:00:00+0000 |     March, 31 2014 00:00:00+0000 |

Expected results

| PRODUCT |                      START_DATE |                         END_DATE |
|---------|---------------------------------|----------------------------------|
|       A |     July, 01 2013 00:00:00+0000 | September, 30 2013 00:00:00+0000 |
|       B |  October, 01 2013 00:00:00+0000 |  November, 30 2013 00:00:00+0000 |
|       A | December, 01 2013 00:00:00+0000 |     March, 31 2014 00:00:00+0000 |

查看完整的SQL Fiddle

4 个答案:

答案 0 :(得分:6)

这是一个缺口和岛屿问题。有各种方法来处理它;这使用leadlag分析函数:

select distinct product,
  case when start_date is null then lag(start_date)
    over (partition by product order by rn) else start_date end as start_date,
  case when end_date is null then lead(end_date)
    over (partition by product order by rn) else end_date end as end_date
from (
  select product, start_date, end_date, rn
  from (
    select t.product,
      case when lag(end_date)
          over (partition by product order by start_date) is null
        or lag(end_date)
          over (partition by product order by start_date) != start_date - 1
        then start_date end as start_date,
      case when lead(start_date)
          over (partition by product order by start_date) is null
        or lead(start_date)
          over (partition by product order by start_date) != end_date + 1
        then end_date end as end_date,
      row_number() over (partition by product order by start_date) as rn
    from t
  )
  where start_date is not null or end_date is not null
)
order by start_date, product;

PRODUCT START_DATE END_DATE
------- ---------- ---------
A       01-JUL-13  30-SEP-13 
B       01-OCT-13  30-NOV-13 
A       01-DEC-13  31-MAR-14 

SQL Fiddle

最里面的查询查看产品的前后记录,如果记录不连续,则仅保留开始和/或结束时间:

select t.product,
  case when lag(end_date)
      over (partition by product order by start_date) is null
    or lag(end_date)
      over (partition by product order by start_date) != start_date - 1
    then start_date end as start_date,
  case when lead(start_date)
      over (partition by product order by start_date) is null
    or lead(start_date)
      over (partition by product order by start_date) != end_date + 1
    then end_date end as end_date
from t;

PRODUCT START_DATE END_DATE
------- ---------- ---------
A       01-JUL-13            
A                            
A                  30-SEP-13 
A       01-DEC-13            
A                            
A                            
A                  31-MAR-14 
B       01-OCT-13            
B                  30-NOV-13 

下一级选择会删除那些处于中期的内容,其中两个日期都被内部查询消隐,这给出了:

PRODUCT START_DATE END_DATE
------- ---------- ---------
A       01-JUL-13            
A                  30-SEP-13 
A       01-DEC-13            
A                  31-MAR-14 
B       01-OCT-13            
B                  30-NOV-13 

外部查询然后折叠那些相邻的对;我已经使用了创建重复项的简单路径,然后使用distinct来消除它们,但您可以通过其他方式执行此操作,例如将两个值放入其中一对行中,并将两个值保留在另一个行中,并且然后用另一层选择消除那些,但我认为这里的区别是可以的。

如果您的真实用例有时间而不仅仅是日期,那么您需要在内部查询中调整比较;而不是+/- 1,可能是1秒的间隔,或者如果您愿意,则为1/86400,但取决于您的值的精确度。

答案 1 :(得分:2)

似乎应该有一种更简单的方法,但是分析查询(找到不同的间隙)和分层查询(连接连续的行)的组合起作用:

with data as (
    select 'A' product, to_date('7/1/2013', 'MM/DD/YYYY') start_date, to_date('7/31/2013', 'MM/DD/YYYY') end_date from dual union all
    select 'A' product, to_date('8/1/2013', 'MM/DD/YYYY') start_date, to_date('8/31/2013', 'MM/DD/YYYY') end_date from dual union all
    select 'A' product, to_date('9/1/2013', 'MM/DD/YYYY') start_date, to_date('9/30/2013', 'MM/DD/YYYY') end_date from dual union all
    select 'B' product, to_date('10/1/2013', 'MM/DD/YYYY') start_date, to_date('10/31/2013', 'MM/DD/YYYY') end_date from dual union all
    select 'B' product, to_date('11/1/2013', 'MM/DD/YYYY') start_date, to_date('11/30/2013', 'MM/DD/YYYY') end_date from dual union all
    select 'A' product, to_date('12/1/2013', 'MM/DD/YYYY') start_date, to_date('12/31/2013', 'MM/DD/YYYY') end_date from dual union all
    select 'A' product, to_date('1/1/2014', 'MM/DD/YYYY') start_date, to_date('1/31/2014', 'MM/DD/YYYY') end_date from dual union all
    select 'A' product, to_date('2/1/2014', 'MM/DD/YYYY') start_date, to_date('2/28/2014', 'MM/DD/YYYY') end_date from dual union all
    select 'A' product, to_date('3/1/2014', 'MM/DD/YYYY') start_date, to_date('3/31/2014', 'MM/DD/YYYY') end_date from dual
),
start_points as
(
    select product, start_date, end_date, prior_end+1, case when prior_end + 1 = start_date then null else 'Y' end start_point 
    from (
        select product, start_date, end_date, lag(end_date,1) over (partition by product order by end_date) prior_end
        from data
    )
)
select product, min(start_date) start_date, max(end_date) end_date
from (
    select product, start_date, end_date, level, connect_by_root(start_date) root_start
    from start_points
    start with start_point = 'Y'
    connect by prior end_date = start_date - 1
    and prior product = product
)
group by product, root_start;



PRODUCT START_DATE END_DATE 
------- ---------- ---------
A       01-JUL-13  30-SEP-13
A       01-DEC-13  31-MAR-14
B       01-OCT-13  30-NOV-13

答案 2 :(得分:0)

这是一个非常复杂的步骤,但这是我解决类似问题的方式:

-- Sample Data
CREATE TABLE AdjacentValidity 
  (
RowID INT IDENTITY(1,1) NOT NULL,
Product VARCHAR(1) NOT NULL,
Start_Date DATETIME NOT NULL,
End_Date DATETIME NOT NULL
  )

INSERT INTO AdjacentValidity (Product, Start_Date, End_Date)

SELECT 'A', '7/1/2013', '7/31/2013' UNION
SELECT 'A', '8/1/2013', '8/31/2013' UNION
SELECT 'A', '9/1/2013', '9/30/2013' UNION
SELECT 'B', '10/1/2013', '10/31/2013' UNION
SELECT 'B', '11/1/2013', '11/30/2013' UNION
SELECT 'A', '12/1/2013', '12/31/2013' UNION
SELECT 'A', '1/1/2014', '1/31/2014' UNION
SELECT 'A', '2/1/2014', '2/28/2014' UNION
SELECT 'A', '3/1/2014', '3/31/2014'


-- Modify the sample data to include necessary tags
CREATE TABLE #RawData
  (
    RawData_ID INT IDENTITY(1,1) NOT NULL,
    Product VARCHAR(1) NOT NULL,
    Start_Date DATETIME NOT NULL,
    End_Date DATETIME NOT NULL,
    isFirstOccurrence BIT NULL,
    isLastOccurrence BIT NULL,
    isFirstInstance BIT NULL,
    isLastInstance BIT NULL
  )

-- Load and flag first occurrences of a natural key
INSERT INTO #RawData
  (
    Product,
    Start_Date,
    End_Date,
    isFirstInstance
  )
SELECT 
    Product,
    Start_Date,
    End_Date,
    CASE WHEN ROW_NUMBER() OVER
      (
        --PARTITION BY <NaturalKey>
        ORDER BY Start_date
      ) = 1 THEN 1 ELSE 0 END AS isFirstOccurrence
FROM AdjacentValidity

-- update to flag the last sequential instance of a particalar data set, and the last     occurrence of a natural key
UPDATE a
SET 
    a.isLastInstance = 
     CASE 
      WHEN 
        a.Product <> b.Product OR 
        DATEADD(m, 1, a.Start_Date) <> b.Start_Date  OR
        b.RawData_ID IS NULL
      THEN 1 
      ELSE 0 
     END,
    a.isLastOccurrence = 
     CASE
      WHEN 
        b.RawData_ID IS NULL
      THEN 1 
      ELSE 0 
     END 
FROM 
    #RawData a
     LEFT JOIN
    #RawData b ON 
        b.RawData_ID = a.RawData_ID + 1 --AND 
        --b.<NaturalKey> = a.<NaturalKey>

--  flag first sequential instance of a particular data set
UPDATE b
SET 
    b.isFirstInstance = 
     CASE 
      WHEN 
        a.isLastInstance = 1
      THEN 1 
      ELSE 0 
     END 
FROM 
    #RawData a
     LEFT JOIN
    #RawData b ON 
        b.RawData_ID = a.RawData_ID + 1 --AND 
        --b.<NaturalKey> = a.<NaturalKey>


-- reduce the records to only those that are the first or last occurrence of a     particular data set
CREATE TABLE #UniqueData 
  (
    [UniqueData_ID] [int] IDENTITY(1,1) NOT NULL,
    Start_Date DATETIME NOT NULL,
    End_Date DATETIME NOT NULL,
    Product VARCHAR(1) NULL,
    isFirstOccurrence BIT NULL,
    isLastOccurrence BIT NULL,
    isFirstInstance BIT NULL,
    isLastInstance BIT NULL
  ) 

INSERT INTO #UniqueData
  (
    Start_Date,
    End_Date,
    Product,
    isFirstOccurrence,
    isLastOccurrence,
    isFirstInstance,
    isLastInstance
  )

SELECT 
    Start_Date,
    End_Date,
    Product,
    isFirstOccurrence,
    isLastOccurrence,
    isFirstInstance,
    isLastInstance
FROM 
    #RawData 
WHERE 
    isFirstOccurrence = 1 OR
    isFirstInstance = 1 OR
    isLastInstance = 1
ORDER BY RawData_ID, Start_Date




-- combine the first and last occurrences in any given sequence into a single row
SELECT 
    a.Start_Date,
    ISNULL(b.Start_Date, a.End_Date) End_Date,
    a.Product
FROM 
    #UniqueData a 
     LEFT JOIN
    #UniqueData b ON 
        b.UniqueData_ID = a.UniqueData_ID + 1 AND
        --b.<NaturalKey> = a.<NaturalKey> AND
        a.isLastInstance <> 1
WHERE a.isFirstInstance = 1 or a.isFirstOccurrence = 1
ORDER BY a.UniqueData_ID



-- clean up
/*
DROP TABLE AdjacentValidity
DROP TABLE #RawData
DROP TABLE #UniqueData
*/

答案 3 :(得分:-1)

尝试类似:

with dat as (
select 'A' as product, sysdate-3 as start_dte, sysdate-2 as end_dte from dual
union all
select 'A' as product, sysdate-2 as start_dte, sysdate-1 as end_dte from dual
union all
select 'B' as product, sysdate-5 as start_dte, sysdate-4 as end_dte from dual
)
SELECT product,
       MIN(start_dte) KEEP (DENSE_RANK FIRST ORDER BY start_dte) "Start",
       MAX(end_dte) KEEP (DENSE_RANK LAST ORDER BY end_dte) "End"
  FROM dat
  GROUP BY product
  ORDER BY product;

输出

PRODUCT Start   End
A   2/24/2014 10:25:53 AM   2/26/2014 10:25:53 AM
B   2/22/2014 10:25:53 AM   2/23/2014 10:25:53 AM