在PIG中将数据从一个关系复制到另一个关系

时间:2017-02-09 12:45:42

标签: apache-pig

我有2个关系:

关系A:

101,Ankit-Reddy,08022017
102,Siddarth-Battacharya,08022017
103,Rajesh-Khanna,08022017

和关系B:

102,Ronit-Roy,09022017
103,Ranveer-Singh,09022017
107,sadiya-some,09022017
108,Raj-sharma,09022017

因此在ID 102和B中的103持有不同的日期并且它是现有记录,但107,108是新记录,因此它将保持相同。 如何在A中更新它。

我的决赛桌应如下:

101,Ankit-Reddy,08022017
102,Ronit-Roy,08022017
103,Ranveer-Singh,08022017
107,sadiya-some,09022017
108,Raj-sharma,09022017

任何猪脚本。

1 个答案:

答案 0 :(得分:0)

  • 只获取A say A1
  • 中的记录
  • 只获取B中的记录B1
  • 加入A和B并创建一个记录,其中a1,b2,a3表示C1
  • 合并A1,B1,C1

    paste

<强> A1

A = LOAD 'test1.txt' USING PigStorage(',') AS (a1:int,a2:chararray,a3:chararray);
B = LOAD 'test2.txt' USING PigStorage(',') AS (b1:int,b2:chararray,b3:chararray);

A_JOIN = JOIN A BY a1 LEFT OUTER,B BY b1;
A1 = FILTER A_JOIN BY b1 is null;

B_JOIN  = JOIN A BY a1 RIGHT OUTER,B BY b1;
B1 = FILTER B_JOIN BY a1 is null;

C_JOIN = JOIN A BY a1,B by b1;
C1 = FOREACH C_JOIN GENERATE a1,b2,a3;

D = UNION A1,B1,C1;

<强> B1

101,Ankit-Reddy,08022017

<强> C1

107,sadiya-some,09022017
108,Raj-sharma,09022017