Redshift SQL:根据开始和结束日期获取日期差异

时间:2016-07-29 07:59:37

标签: sql amazon-redshift

我的表有start_date和end_date,我需要从中找到小时差异。问题是这两个日期时间都不在同一天。

user    start_date  end_date    difference
Alex    7/25/2016 16:00 7/26/2016 0:30  8.5
Alex    7/24/2016 16:00 7/25/2016 0:30  8.5
Alex    7/21/2016 16:00 7/22/2016 0:30  8.5
Alex    7/20/2016 16:00 7/21/2016 0:30  8.5
Alex    7/19/2016 16:00 7/20/2016 0:30  8.5
Alex    7/18/2016 16:00 7/19/2016 0:30  8.5
Alex    7/17/2016 16:00 7/18/2016 0:30  8.5
Alex    7/14/2016 16:00 7/15/2016 0:30  8.5
Alex    7/13/2016 16:00 7/14/2016 0:30  8.5
Alex    7/12/2016 16:00 7/13/2016 0:30  8.5
Alex    7/11/2016 16:00 7/12/2016 0:30  8.5
Alex    7/10/2016 16:00 7/11/2016 0:30  8.5

通常是5个工作日,如果我用start_date对它们进行分组,我会得到答案。但我需要一个新的日期列,我需要输出如下所示。请注意,上表中没有2016年5月15日和2016年7月22日。我需要额外的0.5小时&第6天的日期将包含在我的派生表中。

User    Date    difference
Alex    7/25/2016   8.5
Alex    7/24/2016   8.5
Alex    7/22/2016   0.5
Alex    7/21/2016   8.0
Alex    7/20/2016   8.5
Alex    7/19/2016   8.5
Alex    7/18/2016   8.5
Alex    7/17/2016   8.5
Alex    7/15/2016   0.5
Alex    7/14/2016   8.0
Alex    7/13/2016   8.5
Alex    7/12/2016   8.5
Alex    7/11/2016   8.5
Alex    7/10/2016   8.5

我将差异计算为

round(cast(datediff(seconds, start_date, end_date) as decimal)/3600,2)

2 个答案:

答案 0 :(得分:1)

每当有复杂的逻辑时,我建议使用union个查询和将逻辑分成select个查询(甚至是表格​​)。然后,您可以分两步计算出来。主要区别似乎是0.500:00:00之间的00:30:00是否应计入上一个工作日,或者它是否应该独立。后者似乎是基于end_date是否也是工作日本身来确定的。我看到三个案例:

  • 第二天是工作日:
    1. 报告start_date
    2. 上的所有时间
  • 第二天不是工作日:
    1. start_date
    2. 报告start_date到午夜的小时数
    3. end_date
    4. 报告从午夜到end_date的小时数

我根据您的描述使用了以下示例数据:

create temporary table _test (user varchar(20), start_date timestamp, end_date timestamp);
insert into _test values ('Alex', '7/25/2016 16:00', '7/26/2016 0:30'), ('Alex', '7/24/2016 16:00', '7/25/2016 0:30'), ('Alex', '7/21/2016 16:00', '7/22/2016 0:30'), ('Alex', '7/20/2016 16:00', '7/21/2016 0:30'), ('Alex', '7/19/2016 16:00', '7/20/2016 0:30'), ('Alex', '7/18/2016 16:00', '7/19/2016 0:30'), ('Alex', '7/17/2016 16:00', '7/18/2016 0:30'), ('Alex', '7/14/2016 16:00', '7/15/2016 0:30'), ('Alex', '7/13/2016 16:00', '7/14/2016 0:30'), ('Alex', '7/12/2016 16:00', '7/13/2016 0:30'), ('Alex', '7/11/2016 16:00', '7/12/2016 0:30'), ('Alex', '7/10/2016 16:00', '7/11/2016 0:30');

我们需要知道第二天是否是工作日,所以我建议使用lead()窗口函数(请参阅documentation),它将从下一行中为您提供start_date

create temporary table _differences as (
    select
        user_name
      , start_date::date as start_date
      , end_date::date as end_date
       /** 
        * Calculate difference in hours between start_date and end_date: */
      , round(cast(datediff(seconds, start_date, end_date) as decimal)/3600,2) as hours_start_to_end
       /** 
        * Calculate difference in hours between start_date and midnight: */
      , round(cast(datediff(seconds, start_date, dateadd(day, 1, start_date::date)) as decimal)/3600,2) as hours_start_to_midnight
       /**
        * Calculate difference between midnight on end_date and end_date: */
      , round(cast(datediff(seconds, end_date::date, end_date) as decimal)/3600,2) as hours_midnight_to_end
       /**
        * Calculate number of days from end_date until next start_date: */
      , datediff(day, end_date::date, lead(start_date::date) over(partition by user_name order by start_date::date)) as days_until_next_workday
    from
        _test
);

然后是以下查询:

    select
        user_name          as user_name
      , start_date         as ref_date
      , hours_start_to_end as difference
    from
        _differences
    where
        days_until_next_workday = 0 -- report all work hours on start_date
union
    select
        user_name               as user_name
      , start_date              as ref_date
      , hours_start_to_midnight as difference
    from
        _differences
    where
        days_until_next_workday > 0 -- report partial work hours on start_date
union
    select
        user_name             as user_name
      , end_date              as ref_date
      , hours_midnight_to_end as difference
    from
        _differences
    where
        days_until_next_workday > 0 -- report partial work hours on end_date
order by
    user_name
  , ref_date desc
;

会产生以下结果:

 user_name |  ref_date  | difference
-----------+------------+------------
 Alex      | 2016-07-24 |       8.50
 Alex      | 2016-07-22 |       0.50
 Alex      | 2016-07-21 |       8.00
 Alex      | 2016-07-20 |       8.50
 Alex      | 2016-07-19 |       8.50
 Alex      | 2016-07-18 |       8.50
 Alex      | 2016-07-17 |       8.50
 Alex      | 2016-07-15 |       0.50
 Alex      | 2016-07-14 |       8.00
 Alex      | 2016-07-13 |       8.50
 Alex      | 2016-07-12 |       8.50
 Alex      | 2016-07-11 |       8.50
 Alex      | 2016-07-10 |       8.50
(13 rows)

您可以看到7/25/2016缺失,因为start_date上或之后没有7/26/2016,因此您需要弄清楚如何解释该特殊情况。

答案 1 :(得分:1)

这是我如何完成计算并完美运作

select user, trunc(start_time) as date1, 
       SUM(case when id = 1 then round(cast(datediff(seconds, start_time, st_t1) as decimal)/3600,2) end) as SCHEDULE

from
(
select user, start_time,
       case when trunc(start_time) <> trunc(end_time) then cast(to_char(start_time,'yyyy-mm-dd 23:59:59') as timestamp) else cast(to_char(end_time,'yyyy-mm-dd hh24:mi:ss') as timestamp) end as st_t1
from   table1 a
where id = 1
group by user_name, trunc(start_time)

union

select user_name, trunc(end_time) as date1,
       SUM(case when id = 1 then round(cast(datediff(seconds, st_t2, end_time) as decimal)/3600,2) end) as SCHEDULE

from
(
select user_name, end_time, 
       case when trunc(start_time) <> trunc(end_time) then cast(to_char(end_time,'yyyy-mm-dd 00:00:00') as timestamp) else cast(to_char(end_time,'yyyy-mm-dd hh24:mi:ss') as timestamp) end as st_t2
from   table1 a
where  id = 1    
)
group by user, trunc(end_time)