开始日期结束日期组合行

时间:2016-03-15 09:54:12

标签: sql amazon-redshift

在Redshift中,只要第一个结束日期和下一个记录的开始日期之间的差距为32天或更短(< = 32)到单个记录中,并且最小起始时间为1,则通过SQL脚本想要合并月度记录连续月份作为输出开始日期和连续月份结束日期的最大值作为输出结束日期。

以下输入数据是指表格的数据,也列出了预期的输出。输入数据列在ORDER BY ID,STARTDT,ENDDT in ASC

例如,在下表中,考虑ID 100,第一条记录的结尾与下一条记录的开始之间的差距<= 32,但第二条记录结束日期与第三条记录的开始日期之间的差距大于32天,因此前两个记录将合并为一个记录,即(ID),MIN(STARTSDT),MAX(ENDDT),对应于预期输出中的第一个记录。同样,输入数据中3到4个记录之间的记录在32天内,因此这2个记录将合并为单个记录,对应于预期输出中的第二个记录。

输入数据:

ID STARTDT ENDDT
100 2000-01-01 2000-01-31
100 2000-02-01 2000-02-29
100 2000-05-01 2000-05-31
100 2000-06-01 2000-06-30
100 2000-09-01 2000-09-30
100 2000-10-01 2000-10-31
101 2012-06-01 2012-06-30
101 2012-07-01 2012-07-31
102 2000-01-01 2000-01-31
103 2013-03-01 2013-03-31
103 2013-05-01 2013-05-31

预期输出:

ID MIN_STARTDT MAX_END_DT
100 2000-01-01 2000-02-29
100 2000-05-01 2000-06-30
100 2000-09-01 2000-10-31
101 2012-06-01 2012-07-31
102 2000-01-01 2000-01-31
103 2013-03-01 2013-03-31
103 2013-05-01 2013-05-31

2 个答案:

答案 0 :(得分:0)

您可以按步骤执行此操作:

  • 使用join标识应合并两个相邻记录的位置。
  • 然后执行累计求和以为所有此类相邻记录分配分组标识符。
  • 骨料。

看起来像:

  select id, min(startdt), max(enddte)
  from (select t.*,
               count(case when tprev.id is null then 1 else 0 end) over 
                     (partition by t.idid
                      order by t.startdt
                      rows between unbounded preceding and current row
                     ) as grp
        from t left join
             t tprev
             on t.id = tprev.id and
                t.startdt = tprev.enddt + interval '1 day'
       ) t
  group by id, grp;

答案 1 :(得分:0)

问题与此问题非常相似,我的回答也很相似:Fetch rows based on condition

这个想法的要点是使用窗口函数来识别句点之间的转换(相隔小于33天的事件),然后进行一些过滤以删除句点内的行,然后再次使用窗口函数。

完整的解决方案:

SELECT
  id,
  startdt AS period_start,
  period_end
FROM (
  SELECT
    id,
    startdt,
    enddt,
    lead(enddt, 1)
    OVER (PARTITION BY id
      ORDER BY enddt) AS period_end,
    period_boundary
  FROM (
         SELECT
           id,
           startdt,
           enddt,
           CASE WHEN period_switch = 0 AND reverse_period_switch = 1
             THEN 'start'
           ELSE 'end' END AS period_boundary
         FROM (
                SELECT
                  id,
                  startdt,
                  enddt,
                  CASE WHEN datediff(days, enddt, lead(startdt, 1)
                  OVER (PARTITION BY id
                    ORDER BY enddt ASC)) > 32
                    THEN 1
                  ELSE 0 END AS period_switch,
                  CASE WHEN datediff(days, lead(enddt, 1)
                  OVER (PARTITION BY id
                    ORDER BY enddt DESC), startdt) > 32
                    THEN 1
                  ELSE 0 END AS reverse_period_switch
                FROM date_test
              )
           AS sessioned
         WHERE period_switch != 0 OR reverse_period_switch != 0
         UNION
         SELECT -- adding start rows without transition
           id,
           startdt,
           enddt,
           'start'
         FROM (
                SELECT
                  id,
                  startdt,
                  enddt,
                  row_number()
                  OVER (PARTITION BY id
                    ORDER BY enddt ASC) AS row_num
                FROM date_test
              ) AS with_row_number
         WHERE row_num = 1
         UNION
         SELECT -- adding end rows without transition
           id,
           startdt,
           enddt,
           'end'
         FROM (
                SELECT
                  id,
                  startdt,
                  enddt,
                  row_number()
                  OVER (PARTITION BY id
                    ORDER BY enddt desc) AS row_num
                FROM date_test
              ) AS with_row_number
         WHERE row_num = 1
       ) AS with_boundary -- data set containing start/end boundaries
) AS with_end -- data set where end date is propagated into the start row of the period
WHERE period_boundary = 'start'
ORDER BY id, startdt ASC;

注意在您的预期输出中,您有一行103 2013-05-01 2013-05-31,但其开始日期与上一行的结束日期相差31天,所以此行应改为根据您的要求与标识103的上一行合并。

所以我得到的输出看起来像这样:

 id    start       end
100  2000-01-01  2000-02-29
100  2000-05-01  2000-06-30
100  2000-09-01  2000-10-31
101  2012-06-01  2012-07-31
102  2000-01-01  2000-01-31
103  2013-03-01  2013-05-31