Teradata分区顺序事件

时间:2016-03-17 19:32:26

标签: teradata partition

Teradata - 下面是两辆有轨车到达车站。 773被清空(RI / RE)然后加载(RI / RL)然后离开TD。 819只是清空然后离开。我想在此表下方创建结果。我曾尝试在汽车上使用案例陈述分组,但同一辆车有多次通过车站,所以max和min给我不可预知的结果。我已经阅读了分区,但我很难想象这个。任何帮助表示赞赏。

*注意:将初始搜索限制为EVT_CD =' TA'和STN = DEST消除任何简单通过车站的车辆。但我不能将整个记录集限制在那,因为TD有不同的目的地。

CAR_NUMB    EVT_DT  EVT_TM  EVT_CD  EVST_CD WB_ID   STN     DEST
773 03/08/2016  19.05.00    TA          582016  BOSTON      BOSTON  
773 03/12/2016  04.04.00    AP  PU      582016  BOSTON      BOSTON  
773 03/12/2016  14.35.00    RI  RE      412016  BOSTON      BOSTON  
773 03/12/2016  14.37.00    AP  PL      412016  BOSTON      BOSTON  
773 03/12/2016  14.45.00    RI  RL      812016  BOSTON      HOUSTON 
773 03/14/2016  12.22.00    TD          812016  BOSTON      HOUSTON 
819 03/04/2016  17.50.00    TA          362016  STLOUIS     STLOUIS  
819 03/06/2016  13.50.00    AP  PU      362016  STLOUIS     STLOUIS  
819 03/06/2016  17.27.55    RI  RE      042016  STLOUIS     STLOUIS  
819 03/07/2016  00.37.00    RI  PR      042016  STLOUIS     PORTLAND
819 03/11/2016  01.47.00    TD          042016  STLOUIS     PORTLAND     

Desired output: 
CAR_NUMB    TA              AP          RIRE        RIRL        TD
773 03/08/2016 19.05.00 03/12/20..  03/12/20..  03/12/20..  03/14/2016  12.22.00
819 03/04/2016 17.50.00 03/06/20..  03/06/20..      null    03/11/2016 01.47.00

上面的[..]。我切断了格式化的时间戳。

1 个答案:

答案 0 :(得分:2)

您可以使用窗口化聚合函数使用EVT_CD在接下来的5行中搜索给定CASE的最小日期/时间。

我将日期和时间与时间戳结合起来,因为它更容易使用:

SELECT tab.*
  ,CAST(EVT_DT AS TIMESTAMP(0)) + (EVT_TM - TIME '00:00:00' HOUR TO SECOND) AS TA
  ,MIN(CASE WHEN EVT_CD = 'AP'
            THEN CAST(EVT_DT AS TIMESTAMP(0)) + (EVT_TM - TIME '00:00:00' HOUR TO SECOND)
       END) 
   OVER (PARTITION BY CAR_NUMB
         ORDER BY EVT_DT, EVT_TM
         ROWS BETWEEN CURRENT ROW AND 5 FOLLOWING) AS AP
  ,MIN(CASE WHEN EVT_CD = 'RI' AND EVST_CD = 'RE'
            THEN CAST(EVT_DT AS TIMESTAMP(0)) + (EVT_TM - TIME '00:00:00' HOUR TO SECOND)
       END) 
   OVER (PARTITION BY CAR_NUMB 
         ORDER BY EVT_DT, EVT_TM
         ROWS BETWEEN CURRENT ROW AND 5 FOLLOWING) AS RIRE
  ,MIN(CASE WHEN EVT_CD = 'RI' AND EVST_CD = 'RL'
            THEN CAST(EVT_DT AS TIMESTAMP(0)) + (EVT_TM - TIME '00:00:00' HOUR TO SECOND)
       END) 
   OVER (PARTITION BY CAR_NUMB 
         ORDER BY EVT_DT, EVT_TM
         ROWS BETWEEN CURRENT ROW AND 5 FOLLOWING) AS RIRL
  ,MIN(CASE WHEN EVT_CD = 'TD'
            THEN CAST(EVT_DT AS TIMESTAMP(0)) + (EVT_TM - TIME '00:00:00' HOUR TO SECOND)
       END) 
   OVER (PARTITION BY CAR_NUMB 
         ORDER BY EVT_DT, EVT_TM
         ROWS BETWEEN CURRENT ROW AND 5 FOLLOWING) AS TD
FROM tab
QUALIFY -- finally return only the starting row
   EVT_CD = 'TA'

如果TATD之间的行数大于5,则必须相应地调整ROWS。如果数字变化很大且缺少EVT_CD,则此方法可能会失败并报告下次旅行的数据。可以通过添加另一个步骤来解决此问题:

SELECT 
   CAR_NUMB
  ,TA
  ,CASE WHEN   AP < TD THEN AP END AS AP
  ,CASE WHEN RIRE < TD THEN RIRE END AS RIRE
  ,CASE WHEN RIRL < TD THEN RIRL END AS RIRL
  ,TD
FROM
 (
   previous query
 ) AS dt

如果可能缺少TD,您可以采用不同的方法:找到之前的TA时间戳并按其分组:

SELECT 
   CAR_NUMB
  ,TA
  ,MIN(CASE WHEN EVT_CD = 'AP'
            THEN CAST(EVT_DT AS TIMESTAMP(0)) + (EVT_TM - TIME '00:00:00' HOUR TO SECOND)
       END) AS AP
  ,MIN(CASE WHEN EVT_CD = 'RI' AND EVST_CD = 'RE'
            THEN CAST(EVT_DT AS TIMESTAMP(0)) + (EVT_TM - TIME '00:00:00' HOUR TO SECOND)
       END) AS RIRE
  ,MIN(CASE WHEN EVT_CD = 'RI' AND EVST_CD = 'RL'
            THEN CAST(EVT_DT AS TIMESTAMP(0)) + (EVT_TM - TIME '00:00:00' HOUR TO SECOND)
       END) AS RIRL
  ,MIN(CASE WHEN EVT_CD = 'TD'
            THEN CAST(EVT_DT AS TIMESTAMP(0)) + (EVT_TM - TIME '00:00:00' HOUR TO SECOND)
       END) AS TD
FROM
 (
   SELECT tab.*
     ,MAX(CASE WHEN EVT_CD = 'TA'
               THEN CAST(EVT_DT AS TIMESTAMP(0)) + (EVT_TM - TIME '00:00:00' HOUR TO SECOND)
          END) 
      OVER (PARTITION BY CAR_NUMB
            ORDER BY EVT_DT, EVT_TM
            ROWS UNBOUNDED PRECEDING) AS TA
   FROM tab
   -- maybe QUALIFY TA IS NOT NULL?
 ) AS dt
GROUP BY
   CAR_NUMB
  ,TA