自联接表上个月的数据以添加缺少的记录

时间:2019-09-25 15:47:56

标签: sql hive impala

我正在尝试将Impala表与上个月的数据结合起来,以检查当月的丢失记录。我在源表中有Employee记录。如果某个雇员在当月不在场但在上个月在场,则需要将该雇员标记为“已终止”

试图用日期条件和员工姓名进行左外部联接,但不会返回丢失的记录。

员工当月等于上个月的员工

当前报告月份等于上一个报告月份

Input Data:

+---------+---------+-----------+----------------+
|employee | branch  | hire_date | reporting_month|
+---------+---------+-----------+----------------+
| James   | EE      | 20170101  |   20190131     |
+---------+---------+-----------+----------------+
| Judy    | GIP     | 20181014  |   20190131     |
+---------+---------+-----------+----------------+
| James   | EE      | 20170101  |   20190228     |
+---------+---------+-----------+----------------+
| Judy    | GIP     | 20181014  |   20190228     |
+---------+---------+-----------+----------------+
| James   | EE      | 20170101  |   20190331     |
+---------+---------+-----------+----------------+
| Judy    | GIP     | 20181014  |   20190331     |
+---------+---------+-----------+----------------+
| James   | EE      | 20170101  |   20190430     |
+---------+---------+-----------+----------------+
| Max     | EEI     | 20170201  |   20190430     |
+---------+---------+-----------+----------------+

假设当前报告月份为20190430,并且员工Judy不存在,则需要为Judy添加记录,并将其期限标记为“已终止”

Expected Output:

+---------+---------+-----------+----------------+-----------+
|employee | branch  | hire_date | reporting_month| Term_flag |
+---------+---------+-----------+----------------+-----------+
| James   | EE      | 20170101  |   20190131     | NULL      |
+---------+---------+-----------+----------------+-----------+
| Judy    | GIP     | 20181014  |   20190131     | NULL      |
+---------+---------+-----------+----------------+-----------+
| James   | EE      | 20170101  |   20190228     | NULL      |
+---------+---------+-----------+----------------+-----------+
| Judy    | GIP     | 20181014  |   20190228     | NULL      |
+---------+---------+-----------+----------------+-----------+
| James   | EE      | 20170101  |   20190331     | NULL      |
+---------+---------+-----------+----------------+-----------+
| Judy    | GIP     | 20181014  |   20190331     | NULL      |
+---------+---------+-----------+----------------+-----------+
| James   | EE      | 20170101  |   20190430     | NULL      |
+---------+---------+-----------+----------------+-----------+
| Judy    | GIP     | 20181014  |   20190430     |Terminated | 
+---------+---------+-----------+----------------+-----------+
| Max     | EEI     | 20170201  |   20190430     | NULL      |
+---------+---------+-----------+----------------+-----------+

1 个答案:

答案 0 :(得分:0)

我不确定20190430的神奇日期来自哪里。基本思想是union all,如下所示:

select employee, branch, hire_date, reporting_month, null as term_flag
from input
union all
select employee, branch, hire_date, 20190430 as reporting_month, 'terminated'
from (select i.*,
             row_number() over (order by reporting_month desc) as seqnum
      from input i
     ) i
where seqnum = 1 and
      months_add(trunc(reporting_month, 'MON') , 1) < '2019-04-01';

月份算术可能有点棘手,因为您的日期是该月的最后一天而不是第一天。