为每个组选择开始/结束日期

时间:2019-07-08 10:04:05

标签: sql oracle analytics

我有一个员工工作分配表,格式如下:

emp_id, dept_id, assignment,  start_dt,    end_dt
1,      10,      project 1,   2001-01-01,  2001-12-31
1,      10,      project 2,   2002-01-01,  2002-12-31
1,      20,      project 3,   2003-01-01,  2003-12-31
1,      20,      project 4,   2004-01-01,  2004-12-31
1,      10,      project 5,   2005-01-01,  2005-12-31

从上表中,我需要总结员工部门的历史记录,即员工在转入其他部门之前在特定部门工作的时间。

预期的输出结果如下所示:

emp_id, dept_id,  start_dt,    end_dt
1,      10,       2001-01-01,  2002-12-31
1,      20,       2003-01-01,  2004-12-31
1,      10,       2005-01-01,  2005-12-31

我尝试使用oracle分析功能解决上述问题,但无法获得所需的输出

    select distinct emp_id, dept_id, start_dt, end_dt 
    from ( 
       select emp_id, dept_id, 
              min(start_date) 
                 over (partition by emp_id, dept_id order by emp_id, dept_id 
                 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as start_dt,
              max(end_date)   
                 over (partition by emp_id, dept_id order by emp_id, dept_id 
                 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as end_dt
       from employee_job_assignment
    )
    where emp_id = 1;

上面的查询产生以下输出:

emp_id, dept_id,  start_dt,    end_dt
1,      10,       2001-01-01,  2005-12-31
1,      20,       2003-01-01,  2004-12-31

3 个答案:

答案 0 :(得分:1)

您可以在下面尝试-

handler = logging.handlers.SysLogHandler(address='/run/systemd/journal/syslog') 

输出:

select emp_id,dept_id,min(start_Date) as start_Date,min(end_date) as end_date
from
(
select *,
row_number() over(order by start_date)-row_number() over(partition by dept_id order by start_date) as grp
from t
)A group by grp, dept_id,emp_id

答案 1 :(得分:1)

该解决方案的关键是根据您的逻辑将行分为几组。您可以使用LAG()函数来实现。例如:

select
  max(emp_id) as emp_id,
  max(dept_id) as dept_id,
  min(start_dt) as start_dt,
  max(end_dt) as end_dt
from (
  select
    *,
    sum(inc) over(partition by emp_id order by start_dt) as grp
  from (
    select
      *,
      case when lag(dept_id) over(partition by emp_id order by start_dt) 
                <> dept_id then 1 else 0 end as inc
    from employee_job_assignment
  ) x
) y
group by grp
order by grp

答案 2 :(得分:1)

这是一个空白和孤岛的问题。但是有一个转折。在这种情况下,您可能还需要考虑同一部门内的差距。对于实例:

emp_id, dept_id, assignment,  start_dt,    end_dt
1,      10,      project 1,   2001-01-01,  2001-12-31
1,      10,      project 2,   2003-01-01,  2003-12-31

这应该返回两行而不是一行。

为此,通过将上一个结束日期与当前开始日期进行比较,确定每个岛屿的起点。这就定义了分组的开始。剩下的就是聚合:

select emp_id, dept_id, min(start_dt), max(end_dt)
from (select eja.*,
             sum(case when prev_end_dt = start_dt - 1
                      then 0 else 1
                 end) over (partition by emp_id, dept_id) as grouping
      from (select eja.*,
                   lag(end_dt) over (partition by emp_id, dept_id order by start_dt) as prev_end_dt
            from employee_job_assignment eja
           ) eja
     ) eja
group by emp_id, dept_id, grouping;