在teradata中按日期对一组记录进行分组

时间:2014-01-08 03:43:59

标签: sql date grouping teradata

目前我在表格中有数据如下所示:

date        id   value
1-Jan-13    1    100
2-Jan-13    1    100
3-Jan-13    1    100
4-Jan-13    1    200
5-Jan-13    1    200
6-Jan-13    1    100
7-Jan-13    1    100

我正在尝试根据id和val以及带有startdate和结束日期的版本记录对记录进行分组。

期望的输出:

start date  end date    id   value
1-Jan-13    3-Jan-13    1    100
4-Jan-13    5-Jan-13    1    200
6-Jan-13    7-Jan-13    1    100

2 个答案:

答案 0 :(得分:0)

我不是Teradata的专家,但你最有可能,因为支持窗口函数(特别是ROW_NUMBER),能够做这样的事情

SELECT MIN(date) start_date, MAX(date) end_date, id, value
  FROM
(
  SELECT date, id, value,
         ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) -
         ROW_NUMBER() OVER (PARTITION BY id, value ORDER BY date) island
    FROM table1
) q
 GROUP BY id, value, island 
 ORDER BY start_date, end_date

示例输出:

| START_DATE |   END_DATE | ID | VALUE |
|------------|------------|----|-------|
| 2013-01-01 | 2013-01-03 |  1 |   100 |
| 2013-01-04 | 2013-01-05 |  1 |   200 |
| 2013-01-06 | 2013-01-07 |  1 |   100 |

这是 SQLFiddle 演示(这是一个SQL Server演示,但应该在Teradata 中按预期工作)

答案 1 :(得分:0)

可以进一步简化ROW_NUMBER版本:modified SQL Fiddle

对于Teradata:

SELECT 
   id,val,MIN(dt),MAX(dt)
FROM
 (
   SELECT
      id,val,dt,
      dt - ROW_NUMBER() OVER (PARTITION BY id ORDER BY val, dt) AS dummy
   FROM table1
 ) AS dt
GROUP BY 1,2,dummy

TD13.10中有一些几乎不为人知的处理时间序列数据的函数:

WITH cte(id,val,pd) AS 
 (
   SELECT id, val, PERIOD(dt, dt+1) AS pd
   FROM table1
 )
SELECT
   id, val,
   BEGIN(pd) AS start_dt,
   LAST(pd) AS end_dt
FROM
   TABLE (TD_NORMALIZE_MEET
           (NEW VARIANT_TYPE(cte.id,cte.val)
           ,cte.pd)
   RETURNS (id INTEGER
           ,val INTEGER
           ,pd PERIOD(DATE)
           ,Nrm_Count INTEGER)
   HASH BY id
   LOCAL ORDER BY id, val, pd
   ) A 
ORDER BY start_dt, end_dt