Teradata分区查询...动态跟随行

时间:2014-08-06 18:38:56

标签: sql teradata

我有一个包含以下列和数据的表。数据描述了某些客户活动期

cust_id    s_date       e_date
11111    01.03.2014   31.03.2014
11111    10.04.2014   30.04.2014
11111    01.05.2014   10.05.2014
11111    15.06.2014   31.07.2014
22222    01.04.2014   31.05.2014
22222    01.06.2014   30.06.2014
22222    01.07.2014   15.07.2014

我想写一个给出这个结果的查询:

cust_id    s_date       e_date
11111    01.03.2014   10.05.2014
11111    15.06.2014   31.07.2014
22222    01.04.2014   15.07.2014

查询结果的目的是在客户的活动期间少于15天时将行“合并”为一行。我可以处理“前面的1行”,但如果需要合并3行或更多行,那么它不起作用。我没有想法如何编写这个查询。

查询之前的“半”1行:

SELECT cust_id
     , start_date     as current_period_start_date
     , end_date       as current_period_end_date
     , end_date+15    as current_period_expired_date
     , coalesce(
            min(current_period_expire_date)
           over(partition by cust_id
                    order by start_date
                     rows between 1 preceding and 1 preceding)
               , cast('1900-01-01' as date)) as previous_period_expire_date
     , case 
         when current_period_start_date <= previous_period_expire_date
         then min(current_period_start_date)
             over(partition by cust_id
                      order by start_date
                       rows between 1 preceding and current row)
         else current_period_start_date
       end as new_current_period_start_date

  FROM MY_DB.my_table
     . . .

此外,是否可以将此前的动态更改为?

... over(partition by ... order by ... rows between X preceding and current row)

2 个答案:

答案 0 :(得分:2)

戈登的答案可以修改,因为基本的LAG语法很容易被重写:

LAG(col, n) OVER (ORDER BY c) 

相同
MIN(col) OVER (ORDER BY c ROWS BETWEEN n PRECEDING AND n PRECEDING)

作为第三个参数的可能默认值可以使用 COALESCE(LAG ....,默认值)来完成,只有IGNORE NULLS选项是非常难的选项。

这导致:

SELECT cust_id, MIN(s_date) AS s_date, MAX(e_date) AS e_date
FROM (SELECT t.*, SUM(GroupStartFlag) OVER (PARTITION BY cust_id ORDER BY s_date ROWS UNBOUNDED PRECEDING) AS grpid
      FROM (SELECT cust_id, s_date, e_date,
                   (CASE WHEN s_date <= MIN(e_date) 
                                        OVER (PARTITION BY cust_id 
                                              ORDER BY s_date
                                              ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) + 15
                         THEN 0
                         ELSE 1
                    END) AS GroupStartFlag
            FROM  vt
           ) t
     ) t
GROUP BY cust_id, grpid;

如果您不需要任何其他列(只有cust_id和日期),您还可以使用特定的TD 13.10表函数来标准化期间。要包含15天的差异,您可以简单地减去/添加15天:

WITH cte (cust_id, pd)
AS 
 ( SELECT cust_id, PERIOD(s_date-15, e_date) AS pd
   FROM vt
 )
SELECT cust_id,
   BEGIN(pd)+15,
   END(pd),
   cnt
FROM TABLE (TD_NORMALIZE_OVERLAP_MEET
            (NEW VARIANT_TYPE(cte.cust_id)
                ,cte.pd)
        RETURNS (cust_id INTEGER
                ,pd PERIOD(DATE)
                ,cnt INTEGER) --optional: number of rows normalized in one result row
        HASH BY cust_id
        LOCAL ORDER BY cust_id, pd
        ) AS t;

在TD 14.10中,还有一个非常好的句法规范化语法:

SELECT cust_id, BEGIN (pd)+15, END(pd) 
FROM
 (
   SELECT NORMALIZE
      cust_id, PERIOD(s_date-15, e_date) AS pd
   FROM vt
 ) AS dt

顺便说一句,句点定义为包含开始但是独占结束(即前一个时段的无间期结束和下一个时段的开始具有相同的值),因此您可能必须将15更改为16以获得所需的结果

答案 1 :(得分:0)

我会使用lag()函数来解决这个问题。此函数可用于标识开始新时段的每一行。然后,当该标志被累加求和时,它提供组标识符。这是代码的样子:

select cust_id, min(s_date) as s_date, max(e_date) as e_date
from (select t.*, sum(GroupStartFlag) over (partition by cust_id order by s_date rows unbounded preceding) as grpid
      from (select cust_id, s_date, e_date,
                   (case when s_date <= lag(e_date) over (partition by cust_id order by s_date) + 15
                         then 0
                         else 1
                    end) as GroupStartFlag
            from  MY_DB.my_table
           ) t
     ) t
group by cust_id, grpid;

注意:Teradata支持窗口函数,但有时对它们有奇怪的要求。我认为上面的内容可以直接使用,但我没有系统可以测试它。

编辑:

我不确定Teradata是否支持lag()功能。您可以使用相关子查询执行等效操作:

select cust_id, min(s_date) as s_date, max(e_date) as e_date
from (select t.*,
             sum(case when s_date <= prev_edate + 15 then 0 else 1 end) over
                 (partition by cust_id order by s_date rows unbounded preceding) as grpid
      from (select cust_id, s_date, e_date,
                   (select max(e_date) 
                    from MY_DB.my_table t2
                    where t2.cust_id = t.cust_id and
                          t2.s_date < t.s_date
                   ) as prev_edate
            from  MY_DB.my_table t
           ) t
     ) t
group by cust_id, grpid;