通过在PIG脚本中的不同块内计算出的条件值在FOREACH块内进行过滤

时间:2018-08-08 12:37:49

标签: apache-pig

我有2个数据集,我需要找到与记录匹配的匹配记录 从数据集1到数据集2,例如:

dataset 1 = [sourceID, details, key]
1, details1, 1111
2, details2, 1112
3, details3, 1113
4, details4, 1114
...

dataset2 = [key1, key2, number]
1111,1112,3
1111,1114,1
1112,1113,11
 ...

output: 
1, details1, 1111, 2, details2, 1112, 3
1, details1,1111, 4, details4, 1114, 1
2, details2, 1112, 3, details3, 11
....

我尝试如下:

a = foreach dataset1 {
        b = filter dataset2 by dataset1.key1 matches dataset1.key;
        c = filter dataset2 by datset1.key2 matches dataset1.key; 
        generate b, c;
    };

请帮忙。

非常感谢。

1 个答案:

答案 0 :(得分:0)

运行两个联接?

B = join dataset1 by key, dataset2 by key1;
C = join dataset1 by key, B by key2;