我有2个关系:
关系A:
101,Ankit-Reddy,08022017
102,Siddarth-Battacharya,08022017
103,Rajesh-Khanna,08022017
和关系B:
102,Ronit-Roy,09022017
103,Ranveer-Singh,09022017
107,sadiya-some,09022017
108,Raj-sharma,09022017
因此在ID 102和B中的103持有不同的日期并且它是现有记录,但107,108是新记录,因此它将保持相同。 如何在A中更新它。
我的决赛桌应如下:
101,Ankit-Reddy,08022017
102,Ronit-Roy,08022017
103,Ranveer-Singh,08022017
107,sadiya-some,09022017
108,Raj-sharma,09022017
任何猪脚本。
答案 0 :(得分:0)
合并A1,B1,C1
paste
<强> A1 强>
A = LOAD 'test1.txt' USING PigStorage(',') AS (a1:int,a2:chararray,a3:chararray);
B = LOAD 'test2.txt' USING PigStorage(',') AS (b1:int,b2:chararray,b3:chararray);
A_JOIN = JOIN A BY a1 LEFT OUTER,B BY b1;
A1 = FILTER A_JOIN BY b1 is null;
B_JOIN = JOIN A BY a1 RIGHT OUTER,B BY b1;
B1 = FILTER B_JOIN BY a1 is null;
C_JOIN = JOIN A BY a1,B by b1;
C1 = FOREACH C_JOIN GENERATE a1,b2,a3;
D = UNION A1,B1,C1;
<强> B1 强>
101,Ankit-Reddy,08022017
<强> C1 强>
107,sadiya-some,09022017
108,Raj-sharma,09022017