目前我在表格中有数据如下所示:
date id value
1-Jan-13 1 100
2-Jan-13 1 100
3-Jan-13 1 100
4-Jan-13 1 200
5-Jan-13 1 200
6-Jan-13 1 100
7-Jan-13 1 100
我正在尝试根据id和val以及带有startdate和结束日期的版本记录对记录进行分组。
期望的输出:
start date end date id value
1-Jan-13 3-Jan-13 1 100
4-Jan-13 5-Jan-13 1 200
6-Jan-13 7-Jan-13 1 100
答案 0 :(得分:0)
我不是Teradata的专家,但你最有可能,因为支持窗口函数(特别是ROW_NUMBER
),能够做这样的事情
SELECT MIN(date) start_date, MAX(date) end_date, id, value
FROM
(
SELECT date, id, value,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) -
ROW_NUMBER() OVER (PARTITION BY id, value ORDER BY date) island
FROM table1
) q
GROUP BY id, value, island
ORDER BY start_date, end_date
示例输出:
| START_DATE | END_DATE | ID | VALUE | |------------|------------|----|-------| | 2013-01-01 | 2013-01-03 | 1 | 100 | | 2013-01-04 | 2013-01-05 | 1 | 200 | | 2013-01-06 | 2013-01-07 | 1 | 100 |
这是 SQLFiddle 演示(这是一个SQL Server演示,但应该在Teradata 中按预期工作)
答案 1 :(得分:0)
可以进一步简化ROW_NUMBER版本:modified SQL Fiddle
对于Teradata:
SELECT
id,val,MIN(dt),MAX(dt)
FROM
(
SELECT
id,val,dt,
dt - ROW_NUMBER() OVER (PARTITION BY id ORDER BY val, dt) AS dummy
FROM table1
) AS dt
GROUP BY 1,2,dummy
TD13.10中有一些几乎不为人知的处理时间序列数据的函数:
WITH cte(id,val,pd) AS
(
SELECT id, val, PERIOD(dt, dt+1) AS pd
FROM table1
)
SELECT
id, val,
BEGIN(pd) AS start_dt,
LAST(pd) AS end_dt
FROM
TABLE (TD_NORMALIZE_MEET
(NEW VARIANT_TYPE(cte.id,cte.val)
,cte.pd)
RETURNS (id INTEGER
,val INTEGER
,pd PERIOD(DATE)
,Nrm_Count INTEGER)
HASH BY id
LOCAL ORDER BY id, val, pd
) A
ORDER BY start_dt, end_dt