如果第二个字段中包含不同的值,如何过滤/删除记录... 例如:
ID,NAME
100 , ABC
100 , DEF
100 , XYZ
102 , ABC
102 , ABC
103 , ABC
输出:
102 , ABC
103 , ABC
注意:100应该被删除,因为它包含两个不同的名称,而102应该只在输出中使用一次..
答案 0 :(得分:0)
简单步骤:
A = load 'file' using PigStorage(',') as (ID:int,NAME:chararray);
B = DISTINCT A;
C = filter B by NAME =='ABC';
D = filter B by NAME !='ABC';
E = join C by ID left outer, D by ID;
F = filter E by (D::NAME is null);
G = foreach F generate C::ID as ID,C::NAME as NAME;
希望这会有所帮助..