我有一张像
这样的表格KeyField DeltaField SomeField Row_Ins_Ts
1 a 1 '2016-01-01 00:00:00'
1 a 2 '2016-01-02 00:00:00'
1 b 3 '2016-01-03 00:00:00'
1 a 4 '2016-01-04 00:00:00'
2 d 5 '2016-01-01 00:00:00'
2 d 6 '2016-01-02 00:00:00'
2 e 7 '2016-01-03 00:00:00'
2 e 8 '2016-01-04 00:00:00'
我需要得到给定KeyField的DeltaField的每个值当前的间隔。
上述数据集的结果集为:
KeyField DeltaField Rec_Strt_Ts Rec_End_Ts
1 a '2016-01-01 00:00:00' '2016-01-02 23:59:59'
1 b '2016-01-03 00:00:00' '2016-01-03 23:59:59'
1 a '2016-01-04 00:00:00' '9999-12-31 23:59:59'
2 d '2016-01-01 00:00:00' '2016-01-02 23:59:59'
2 e '2016-01-03 00:00:00' '9999-12-31 23:59:59'
答案 0 :(得分:1)
我正在尝试使用Vertica而不是Teradata,但幸运的是它们支持ANSI 99标准:窗口函数,也称为OLAP或分析函数,以及具有多个相互依赖的全局表表达式的WITH子句。
第一个全局表表达式只是生成示例数据;第二个属于查询,因为您通常不能将OLAP函数放入WHERE子句中,因此需要在子选择中获取它。并且LAG()OLAP函数是您需要的,以便能够过滤掉与其前任具有相同deltafield
的行。因此,在最终查询中,我可以过滤deltafield <> prev_deltafield
并使用LEAD()OLAP函数并从中减去一秒以获得结束时间戳。我使用IFNULL()函数来满足我没有LEAD()或LAG()值的情况。 IFNULL()的同义词可以是NVL()或VALUE()。 COALESCE()也可以工作,但它的速度较慢,因为它有不同数量的参数。见这里:
WITH foo(keyfield,deltafield,somefield,row_ins_ts) AS (
SELECT 1,'a',1, TIMESTAMP '2016-01-01 00:00:00'
UNION ALL SELECT 1,'a',2, TIMESTAMP '2016-01-02 00:00:00'
UNION ALL SELECT 1,'b',3, TIMESTAMP '2016-01-03 00:00:00'
UNION ALL SELECT 1,'a',4, TIMESTAMP '2016-01-04 00:00:00'
UNION ALL SELECT 2,'d',5, TIMESTAMP '2016-01-01 00:00:00'
UNION ALL SELECT 2,'d',6, TIMESTAMP '2016-01-02 00:00:00'
UNION ALL SELECT 2,'e',7, TIMESTAMP '2016-01-03 00:00:00'
UNION ALL SELECT 2,'e',8, TIMESTAMP '2016-01-04 00:00:00'
)
, add_previous_deltafield AS (
SELECT
keyfield
, deltafield
, LAG(deltafield) OVER(PARTITION BY keyfield ORDER BY row_ins_ts) AS prev_deltafield
, row_ins_ts
FROM foo
)
SELECT
keyfield
, deltafield
, row_ins_ts AS rec_start_ts
, IFNULL(
LEAD(row_ins_ts) OVER(
PARTITION BY keyfield ORDER BY row_ins_ts
) - INTERVAL '1 SECOND'
, '9999-12-31 23:59:59'
) AS rec_end_ts
FROM add_previous_deltafield
WHERE deltafield <> IFNULL(prev_deltafield,'')
ORDER BY keyfield;
快乐的比赛 - Marco the Sane
答案 1 :(得分:1)
作为Window函数的替代品(因为它们是可移植的,它们很好),您可以使用Teradata内置的Period逻辑以及一些内置函数来快速解决这个问题。
基本上这将分为三个部分:
Period(<begindate>, <enddate>)
形式并使用日期或时间戳。 在你的例子中:
WITH subtbl(keyfield, deltafield, durations) AS
(
SELECT
keyfield,
deltafield,
PERIOD(row_ins_ts, row_ins_ts + INTERVAL '1' DAY ) AS durations
FROM
<yourtable>
)
SELECT keyfield, deltafield, BEGIN(durations)
FROM TABLE
(
TD_NORMALIZE_OVERLAP_MEET (NEW VARIANT_TYPE(subtbl.keyfield, subtbl.deltafield), subtbl.durations)
RETURNS (keyfield INTEGER, deltafield CHAR(1), durations PERIOD(TIMESTAMP(0)), numRecords INTEGER)
HASH BY keyfield, deltafield
LOCAL ORDER BY keyfield, deltafield, durations
) AS dt(keyfield, deltafield, durations, numRecords)
ORDER BY 1, 2;
哪个输出:
+----------+------------+------------------+
| keyfield | deltafield | BEGIN(durations) |
+----------+------------+------------------+
| 1 | a | 1/4/2016 0:00 |
| 1 | a | 1/1/2016 0:00 |
| 1 | b | 1/3/2016 0:00 |
| 2 | d | 1/1/2016 0:00 |
| 2 | e | 1/3/2016 0:00 |
+----------+------------+------------------+