我有6亿行,如下表1所示。在Teradata SQL中,如何将“按日期”转换为“日期范围”?
+-----------+-------+------------+----------+
| ProductID | Store | Trans_Date | Cost_Amt |
+-----------+-------+------------+----------+
| 20202 | 2320 | 2018-01-02 | $9.23 |
| 20202 | 2320 | 2018-01-03 | $9.23 |
| 20202 | 2320 | 2018-01-04 | $9.23 |
| 20202 | 2320 | 2018-01-05 | $9.38 |
| 20202 | 2320 | 2018-01-06 | $9.38 |
| 20202 | 2320 | 2018-01-07 | $9.38 |
| 20202 | 2320 | 2018-01-08 | $9.23 |
| 20202 | 2320 | 2018-01-09 | $9.23 |
| 20202 | 2320 | 2018-01-10 | $9.23 |
+-----------+-------+------------+----------+
所需的输出:
+-----------+-------+------------+------------+----------+
| ProductID | Store | Start Date | End Date | Cost_Amt |
+-----------+-------+------------+------------+----------+
| 20202 | 2320 | 2018-01-02 | 2018-01-04 | $9.23 |
| 20202 | 2320 | 2018-01-05 | 2018-01-07 | $9.38 |
| 20202 | 2320 | 2018-01-08 | 2018-01-10 | $9.23 |
+-----------+-------+------------+------------+----------+
答案 0 :(得分:4)
如果每个ProductID,Store,Trans_Date
有一行(没有丢失的日期),则可以应用专有的Teradata语法来规范重叠的日期范围:
SELECT
ProductID
,Store
-- split the period back into seperate dates
,Begin(pd) AS StartDate
,Last(pd) AS EndDate -- must be LAST not END to match your logic
,Cost_Amt
FROM
(
SELECT NORMALIZE
ProductID
,Store
-- NORMALIZE only works with periods, so create it on the fly
,PERIOD(Trans_Date,Trans_Date+1) AS pd
,Cost_Amt
FROM bigtable
) AS dt
normalize
还有一个create table
选项。
答案 1 :(得分:2)
我将示例扩展为:
CREATE TABLE bigtable(
ProductID INTEGER
,Store INTEGER
,Trans_Date DATE
,Cost_Amt VARCHAR(10)
);
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-02','$9.23');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-03','$9.23');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-04','$9.23');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-05','$9.38');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-06','$9.38');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-07','$9.38');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-08','$9.23');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-09','$9.23');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-10','$9.23');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-11','$9.38');
,此查询用于显示派生表:
select
*
, row_number() over(partition by ProductID,Store order by Trans_Date) rn1
, row_number() over(partition by ProductID,Store,Cost_Amt order by Trans_Date) rn2
, row_number() over(partition by ProductID,Store order by Trans_Date)
- row_number() over(partition by ProductID,Store,Cost_Amt order by Trans_Date) grp
from bigtable
order by ProductID,Store,Trans_Date
;
计算稍后需要的“ grp”值:
| | ProductID | Store | Trans_Date | Cost_Amt | rn1 | rn2 | grp |
|----|-----------|-------|---------------------|----------|-----|-----|-----|
| 1 | 20202 | 2320 | 02.01.2018 00:00:00 | $9.23 | 1 | 1 | 0 |
| 2 | 20202 | 2320 | 03.01.2018 00:00:00 | $9.23 | 2 | 2 | 0 |
| 3 | 20202 | 2320 | 04.01.2018 00:00:00 | $9.23 | 3 | 3 | 0 |
| 4 | 20202 | 2320 | 05.01.2018 00:00:00 | $9.38 | 4 | 1 | 3 |
| 5 | 20202 | 2320 | 06.01.2018 00:00:00 | $9.38 | 5 | 2 | 3 |
| 6 | 20202 | 2320 | 07.01.2018 00:00:00 | $9.38 | 6 | 3 | 3 |
| 7 | 20202 | 2320 | 08.01.2018 00:00:00 | $9.23 | 7 | 4 | 3 |
| 8 | 20202 | 2320 | 09.01.2018 00:00:00 | $9.23 | 8 | 5 | 3 |
| 9 | 20202 | 2320 | 10.01.2018 00:00:00 | $9.23 | 9 | 6 | 3 |
| 10 | 20202 | 2320 | 11.01.2018 00:00:00 | $9.38 | 10 | 4 | 6 |
现在可以计算日期范围:
select
ProductID
, Store
, Cost_Amt
, grp
, min(Trans_Date) start_date
, max(Trans_Date) end_date
from (
select
*
, row_number() over(partition by ProductID,Store order by Trans_Date)
- row_number() over(partition by ProductID,Store,Cost_Amt order by Trans_Date) grp
from bigtable
) d
group by
ProductID
, Store
, Cost_Amt
, grp
;
其结果是:
| | ProductID | Store | Cost_Amt | grp | (No column name) | (No column name) |
|----|-----------|-------|----------|-----|---------------------|---------------------|
| 1 | 20202 | 2320 | $9.23 | 0 | 02.01.2018 00:00:00 | 04.01.2018 00:00:00 |
| 2 | 20202 | 2320 | $9.23 | 3 | 08.01.2018 00:00:00 | 10.01.2018 00:00:00 |
| 3 | 20202 | 2320 | $9.38 | 3 | 05.01.2018 00:00:00 | 07.01.2018 00:00:00 |
| 4 | 20202 | 2320 | $9.38 | 6 | 11.01.2018 00:00:00 | 11.01.2018 00:00:00 |
答案 2 :(得分:0)
这不是很优雅,但这似乎是正确的:
select A.ProductID , A.Store , A.Trans_Date as Start_Date ,
(select max(Trans_Date) from Table1 as C
where A.ProductID = C.ProductID
and A.Store = C.Store
and A.Cost_Amt = C.Cost_Amt
and C.Trans_Date < (
select coalesce(min(Trans_Date),'9999-01-01')
from Table1 as D
where A.ProductID = D.ProductID
and A.Store = D.Store
and A.Cost_Amt <> D.Cost_Amt
and A.Trans_Date < D.Trans_Date
)
) as End_Date , A.Cost_Amt
from Table1 as A
where A.Trans_Date = (select min(Trans_Date)
from Table1 as B
where A.ProductID = B.ProductID
and A.Store = B.Store
and A.Cost_Amt = B.Cost_Amt
and B.Trans_Date > (
select coalesce(max(Trans_Date),'0001-01-01')
from Table1 as E
where A.ProductID = E.ProductID
and A.Store = E.Store
and A.Cost_Amt <> E.Cost_Amt
and A.Trans_Date > E.Trans_Date
)
)