我有两个关系,其架构和数据如下:
latest_extract
ticket_num,employee_id,assigned_to,team,assigned_date
1234567,1122525,michael,printer,2019-01-03
1234569,1122536,julie,printer,2019-01-03
1234571,1122538,priscila,printer,2019-01-03
1234572,1122539,susan,scanner,2019-01-03
1234573,1122540,walter,network,2019-01-03
previous_extract
ticket_num,employee_id,assigned_to,team,assigned_date
1234567,1122525,michael,printer,2019-01-02
1234568,1122525,michale,printer,2019-01-02
1234569,1122536,julie,printer,2019-01-02
1234570,1122537,john,scanner,2019-01-02
1234574.1122541,hudson,windows,2019-01-02
join_latest_previous = JOIN previous_extract BY (ticket_num,employee_id) FULL OUTER, latest_extract BY (ticket_num,employee_id);
latest_extract::ticket_num,latest_extract::employee_id,latest_extract::assigned_to,latest_extract::team,latest_extract::assigned_date,
previous_extract::ticket_num,previous_extract::employee_id,previous_extract::assigned_to,previous_extract::team,previous_extract::assigned_date;
1234567,1122525,michael,printer,2019-01-03,1234567,1122525,michael,printer,2019-01-02
,,,,,1234568,1122525,michale,printer,2019-01-02
1234569,1122536,julie,printer,2019-01-03,1234569,1122536,julie,printer,2019-01-02
1234571,1122538,priscila,printer,2019-01-03,,,,,
1234572,1122539,susan,scanner,2019-01-03,,,,,
,,,,,1234570,1122537,john,scanner,2019-01-02
,,,,,1234573,1122540,walter,network,2019-01-03
1234574.1122541,hudson,windows,2019-01-02,,,,,,
我需要执行以下操作:
如果团队中只有一个员工,并且以前的摘录中不存在员工记录,但最新的标记为1,
否则,如果团队中只有一个员工,并且最近的摘录中不存在员工的记录,但是在以前的记录中,则标记为2,
否则,如果团队中有多个雇员,并且以前的摘录中不存在雇员的记录,但最新的标记为3,
否则,如果团队中有多个雇员,并且在最新摘录中不存在员工记录,但在以前的记录中,则标记为4,
否则应该是5。
diff_latest_previous = FOREACH join_latest_previous GENERATE
((((previous_extract::ticket_num IS NULL) AND (latest_extract::ticket_num IS NOT NULL))OR (previous_extract::ticket_num !=latest_extract::ticket_num))?1:
(((previous_extract::ticket_num IS NOT NULL) AND (latest_extract::ticket_num IS NULL))OR (previous_extract::ticket_num !=latest_extract::ticket_num))?2:
3) AS flag, latest_extract::ticket_num AS l_ticket_num,latest_extract::employee_id AS l_employee_id,latest_extract::assigned_to AS l_assigned_to,latest_extract::team AS l_team,latest_extract::assigned_date AS l_assigned_date,previous_extract::ticket_num AS p_ticket_num,previous_extract::employee_id AS p_employee_id,previous_extract::assigned_to AS p_assigned_to,previous_extract::team AS p_team,previous_extract::assigned_date AS p_assigned_date;
flag,ticket_num,employee_id,assigned_to,team,assigned_date
5,1234567,1122525,michael,printer,2019-01-03,1234567,1122525,michael,printer,2019-01-02
1,,,,,1234568,1122525,michale,printer,2019-01-02
5,1234569,1122536,julie,printer,2019-01-03,1234569,1122536,julie,printer,2019-01-02
2,1234571,1122538,priscila,printer,2019-01-03,,,,,
2,1234572,1122539,susan,scanner,2019-01-03,,,,,
1,,,,1234570,1122537,john,scanner,2019-01-02
1,,,,,,1234573,1122540,walter,network,2019-01-03
2,1234574.1122541,hudson,windows,2019-01-02,,,,,,
在这里,我无法获得3和4的值。
请帮助我找到解决方法。