不在,猪匹配

时间:2017-02-09 08:18:56

标签: hadoop apache-pig

我在猪身上有两个关系:

A,B

DUMP A;
  

Sandeep Rohan Mohan

DUMP B;
  

MOHAN

我需要输出A - B; 关系C应该给我

  

和Sandeep,罗汉

因为他们没有出现在B

2 个答案:

答案 0 :(得分:0)

试试这个:

A1 = LOAD 'Sandeep Rohan Mohan' USING PigStorage() AS (line:chararray);
B1 = LOAD 'MOHAN' USING PigStorage() AS (line:chararray);

A = FOREACH A1 GENERATE UPPER(line) AS line;
B = FOREACH B1 GENERATE UPPER(line) AS line;

C = COGROUP A BY line, B BY line;

D = FILTER C BY IsEmpty(B);

E = FOREACH D GENERATE group AS name;

DUMP E;
  

(ROHAN)(SANDEEP)

也请参考sets operations in apache pig

答案 1 :(得分:0)

使用左外连接实现它,只考虑那些在$ 1中有空值的元组