猪拉丁需要的数据不是普通的

时间:2017-12-26 20:49:31

标签: apache apache-pig

数据1

1,a
2,b
3,c
4,d
5,e

数据2

1,a
2,g
3,j
4,b
5,c
6,d
7,e

脚本

a =  load '/tmp/data/data1' using PigStorage(',') as (timestamp:chararray,constant:chararray);
b =  load '/tmp/data/data2' using PigStorage(',') as (timestamp:chararray,constant:chararray);

我只需要输出不常见且存在于data2中的常量,如下所示

2,g
3,j

感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

RIGHT OUTER JOINFILTER其中a.timestamp为null。这将为您提供b中不在a中的所有记录。

c = JOIN a BY (timestamp) RIGHT OUTER,b BY (timestamp);
d = FILTER c BY (a::timestamp is null);
DUMP d;