我正在尝试将Impala表与上个月的数据结合起来,以检查当月的丢失记录。我在源表中有Employee记录。如果某个雇员在当月不在场但在上个月在场,则需要将该雇员标记为“已终止”
试图用日期条件和员工姓名进行左外部联接,但不会返回丢失的记录。
员工当月等于上个月的员工
当前报告月份等于上一个报告月份
Input Data:
+---------+---------+-----------+----------------+
|employee | branch | hire_date | reporting_month|
+---------+---------+-----------+----------------+
| James | EE | 20170101 | 20190131 |
+---------+---------+-----------+----------------+
| Judy | GIP | 20181014 | 20190131 |
+---------+---------+-----------+----------------+
| James | EE | 20170101 | 20190228 |
+---------+---------+-----------+----------------+
| Judy | GIP | 20181014 | 20190228 |
+---------+---------+-----------+----------------+
| James | EE | 20170101 | 20190331 |
+---------+---------+-----------+----------------+
| Judy | GIP | 20181014 | 20190331 |
+---------+---------+-----------+----------------+
| James | EE | 20170101 | 20190430 |
+---------+---------+-----------+----------------+
| Max | EEI | 20170201 | 20190430 |
+---------+---------+-----------+----------------+
假设当前报告月份为20190430,并且员工Judy不存在,则需要为Judy添加记录,并将其期限标记为“已终止”
Expected Output:
+---------+---------+-----------+----------------+-----------+
|employee | branch | hire_date | reporting_month| Term_flag |
+---------+---------+-----------+----------------+-----------+
| James | EE | 20170101 | 20190131 | NULL |
+---------+---------+-----------+----------------+-----------+
| Judy | GIP | 20181014 | 20190131 | NULL |
+---------+---------+-----------+----------------+-----------+
| James | EE | 20170101 | 20190228 | NULL |
+---------+---------+-----------+----------------+-----------+
| Judy | GIP | 20181014 | 20190228 | NULL |
+---------+---------+-----------+----------------+-----------+
| James | EE | 20170101 | 20190331 | NULL |
+---------+---------+-----------+----------------+-----------+
| Judy | GIP | 20181014 | 20190331 | NULL |
+---------+---------+-----------+----------------+-----------+
| James | EE | 20170101 | 20190430 | NULL |
+---------+---------+-----------+----------------+-----------+
| Judy | GIP | 20181014 | 20190430 |Terminated |
+---------+---------+-----------+----------------+-----------+
| Max | EEI | 20170201 | 20190430 | NULL |
+---------+---------+-----------+----------------+-----------+
答案 0 :(得分:0)
我不确定20190430
的神奇日期来自哪里。基本思想是union all
,如下所示:
select employee, branch, hire_date, reporting_month, null as term_flag
from input
union all
select employee, branch, hire_date, 20190430 as reporting_month, 'terminated'
from (select i.*,
row_number() over (order by reporting_month desc) as seqnum
from input i
) i
where seqnum = 1 and
months_add(trunc(reporting_month, 'MON') , 1) < '2019-04-01';
月份算术可能有点棘手,因为您的日期是该月的最后一天而不是第一天。