无法通过外部联接找到记录

时间:2019-03-17 15:23:45

标签: apache-pig

我有两个关系,其架构和数据如下:

latest_extract

ticket_num,employee_id,assigned_to,team,assigned_date
1234567,1122525,michael,printer,2019-01-03
1234569,1122536,julie,printer,2019-01-03
1234571,1122538,priscila,printer,2019-01-03
1234572,1122539,susan,scanner,2019-01-03
1234573,1122540,walter,network,2019-01-03

previous_extract

ticket_num,employee_id,assigned_to,team,assigned_date
1234567,1122525,michael,printer,2019-01-02
1234568,1122525,michale,printer,2019-01-02
1234569,1122536,julie,printer,2019-01-02
1234570,1122537,john,scanner,2019-01-02
1234574.1122541,hudson,windows,2019-01-02
join_latest_previous = JOIN previous_extract BY (ticket_num,employee_id) FULL OUTER, latest_extract BY (ticket_num,employee_id);

latest_extract::ticket_num,latest_extract::employee_id,latest_extract::assigned_to,latest_extract::team,latest_extract::assigned_date,
previous_extract::ticket_num,previous_extract::employee_id,previous_extract::assigned_to,previous_extract::team,previous_extract::assigned_date;
1234567,1122525,michael,printer,2019-01-03,1234567,1122525,michael,printer,2019-01-02
,,,,,1234568,1122525,michale,printer,2019-01-02
1234569,1122536,julie,printer,2019-01-03,1234569,1122536,julie,printer,2019-01-02
1234571,1122538,priscila,printer,2019-01-03,,,,,
1234572,1122539,susan,scanner,2019-01-03,,,,,
,,,,,1234570,1122537,john,scanner,2019-01-02
,,,,,1234573,1122540,walter,network,2019-01-03
1234574.1122541,hudson,windows,2019-01-02,,,,,,

我需要执行以下操作: 如果团队中只有一个员工,并且以前的摘录中不存在员工记录,但最新的标记为1,
否则,如果团队中只有一个员工,并且最近的摘录中不存在员工的记录,但是在以前的记录中,则标记为2,
否则,如果团队中有多个雇员,并且以前的摘录中不存在雇员的记录,但最新的标记为3,
否则,如果团队中有多个雇员,并且在最新摘录中不存在员工记录,但在以前的记录中,则标记为4,
否则应该是5。

diff_latest_previous = FOREACH join_latest_previous GENERATE 
((((previous_extract::ticket_num IS NULL) AND (latest_extract::ticket_num IS NOT NULL))OR (previous_extract::ticket_num !=latest_extract::ticket_num))?1:
(((previous_extract::ticket_num IS NOT NULL) AND (latest_extract::ticket_num IS NULL))OR (previous_extract::ticket_num !=latest_extract::ticket_num))?2:
3) AS flag, latest_extract::ticket_num AS l_ticket_num,latest_extract::employee_id AS l_employee_id,latest_extract::assigned_to AS l_assigned_to,latest_extract::team AS l_team,latest_extract::assigned_date AS l_assigned_date,previous_extract::ticket_num AS p_ticket_num,previous_extract::employee_id AS p_employee_id,previous_extract::assigned_to AS p_assigned_to,previous_extract::team AS p_team,previous_extract::assigned_date AS p_assigned_date;

flag,ticket_num,employee_id,assigned_to,team,assigned_date
5,1234567,1122525,michael,printer,2019-01-03,1234567,1122525,michael,printer,2019-01-02
1,,,,,1234568,1122525,michale,printer,2019-01-02
5,1234569,1122536,julie,printer,2019-01-03,1234569,1122536,julie,printer,2019-01-02
2,1234571,1122538,priscila,printer,2019-01-03,,,,,
2,1234572,1122539,susan,scanner,2019-01-03,,,,,
1,,,,1234570,1122537,john,scanner,2019-01-02
1,,,,,,1234573,1122540,walter,network,2019-01-03
2,1234574.1122541,hudson,windows,2019-01-02,,,,,,

在这里,我无法获得3和4的值。

请帮助我找到解决方法。

0 个答案:

没有答案