我有一个包含以下列和数据的表。数据描述了某些客户活动期
cust_id s_date e_date
11111 01.03.2014 31.03.2014
11111 10.04.2014 30.04.2014
11111 01.05.2014 10.05.2014
11111 15.06.2014 31.07.2014
22222 01.04.2014 31.05.2014
22222 01.06.2014 30.06.2014
22222 01.07.2014 15.07.2014
我想写一个给出这个结果的查询:
cust_id s_date e_date
11111 01.03.2014 10.05.2014
11111 15.06.2014 31.07.2014
22222 01.04.2014 15.07.2014
查询结果的目的是在客户的活动期间少于15天时将行“合并”为一行。我可以处理“前面的1行”,但如果需要合并3行或更多行,那么它不起作用。我没有想法如何编写这个查询。
查询之前的“半”1行:
SELECT cust_id
, start_date as current_period_start_date
, end_date as current_period_end_date
, end_date+15 as current_period_expired_date
, coalesce(
min(current_period_expire_date)
over(partition by cust_id
order by start_date
rows between 1 preceding and 1 preceding)
, cast('1900-01-01' as date)) as previous_period_expire_date
, case
when current_period_start_date <= previous_period_expire_date
then min(current_period_start_date)
over(partition by cust_id
order by start_date
rows between 1 preceding and current row)
else current_period_start_date
end as new_current_period_start_date
FROM MY_DB.my_table
. . .
此外,是否可以将此前的动态更改为?
... over(partition by ... order by ... rows between X preceding and current row)
答案 0 :(得分:2)
戈登的答案可以修改,因为基本的LAG语法很容易被重写:
LAG(col, n) OVER (ORDER BY c)
与
相同MIN(col) OVER (ORDER BY c ROWS BETWEEN n PRECEDING AND n PRECEDING)
作为第三个参数的可能默认值可以使用 COALESCE(LAG ....,默认值)来完成,只有IGNORE NULLS选项是非常难的选项。
这导致:
SELECT cust_id, MIN(s_date) AS s_date, MAX(e_date) AS e_date
FROM (SELECT t.*, SUM(GroupStartFlag) OVER (PARTITION BY cust_id ORDER BY s_date ROWS UNBOUNDED PRECEDING) AS grpid
FROM (SELECT cust_id, s_date, e_date,
(CASE WHEN s_date <= MIN(e_date)
OVER (PARTITION BY cust_id
ORDER BY s_date
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) + 15
THEN 0
ELSE 1
END) AS GroupStartFlag
FROM vt
) t
) t
GROUP BY cust_id, grpid;
如果您不需要任何其他列(只有cust_id和日期),您还可以使用特定的TD 13.10表函数来标准化期间。要包含15天的差异,您可以简单地减去/添加15天:
WITH cte (cust_id, pd)
AS
( SELECT cust_id, PERIOD(s_date-15, e_date) AS pd
FROM vt
)
SELECT cust_id,
BEGIN(pd)+15,
END(pd),
cnt
FROM TABLE (TD_NORMALIZE_OVERLAP_MEET
(NEW VARIANT_TYPE(cte.cust_id)
,cte.pd)
RETURNS (cust_id INTEGER
,pd PERIOD(DATE)
,cnt INTEGER) --optional: number of rows normalized in one result row
HASH BY cust_id
LOCAL ORDER BY cust_id, pd
) AS t;
在TD 14.10中,还有一个非常好的句法规范化语法:
SELECT cust_id, BEGIN (pd)+15, END(pd)
FROM
(
SELECT NORMALIZE
cust_id, PERIOD(s_date-15, e_date) AS pd
FROM vt
) AS dt
顺便说一句,句点定义为包含开始但是独占结束(即前一个时段的无间期结束和下一个时段的开始具有相同的值),因此您可能必须将15更改为16以获得所需的结果
答案 1 :(得分:0)
我会使用lag()
函数来解决这个问题。此函数可用于标识开始新时段的每一行。然后,当该标志被累加求和时,它提供组标识符。这是代码的样子:
select cust_id, min(s_date) as s_date, max(e_date) as e_date
from (select t.*, sum(GroupStartFlag) over (partition by cust_id order by s_date rows unbounded preceding) as grpid
from (select cust_id, s_date, e_date,
(case when s_date <= lag(e_date) over (partition by cust_id order by s_date) + 15
then 0
else 1
end) as GroupStartFlag
from MY_DB.my_table
) t
) t
group by cust_id, grpid;
注意:Teradata支持窗口函数,但有时对它们有奇怪的要求。我认为上面的内容可以直接使用,但我没有系统可以测试它。
编辑:
我不确定Teradata是否支持lag()
功能。您可以使用相关子查询执行等效操作:
select cust_id, min(s_date) as s_date, max(e_date) as e_date
from (select t.*,
sum(case when s_date <= prev_edate + 15 then 0 else 1 end) over
(partition by cust_id order by s_date rows unbounded preceding) as grpid
from (select cust_id, s_date, e_date,
(select max(e_date)
from MY_DB.my_table t2
where t2.cust_id = t.cust_id and
t2.s_date < t.s_date
) as prev_edate
from MY_DB.my_table t
) t
) t
group by cust_id, grpid;