如果第二个字段在PIG中有不同的值,如何过滤/删除记录

时间:2016-09-02 13:24:28

标签: apache-pig

如果第二个字段中包含不同的值,如何过滤/删除记录... 例如:

ID,NAME

100 , ABC
100 , DEF
100 , XYZ
102 , ABC
102 , ABC
103 , ABC

输出:

102 , ABC
103 , ABC

注意:100应该被删除,因为它包含两个不同的名称,而102应该只在输出中使用一次..

1 个答案:

答案 0 :(得分:0)

简单步骤:

A = load 'file' using PigStorage(',') as (ID:int,NAME:chararray);
B = DISTINCT A;
C = filter B by NAME =='ABC'; 
D = filter B by NAME !='ABC'; 
E = join C by ID left outer, D by ID;
F = filter E by (D::NAME is null); 
G = foreach F generate C::ID as ID,C::NAME as NAME;

希望这会有所帮助..