在Pig中加入两个稍微不同的键

时间:2013-04-24 09:33:30

标签: string join merge mapreduce apache-pig

mapreduce任务:

file_one中的Key1是a1,a2,a3,a10,a11,a12; file_two中的key2是persona1,persona1,persona2,persona3,persona12,persona12,persona3,persona11,persona10。

Merge_file = JOIN file_one BY Key1,file_two by Key2 ??(如何写这个..)

由于第二个密钥有重复,这有关系吗?

感谢

1 个答案:

答案 0 :(得分:0)

我的建议是为每个数据集创建一个新列并加入其中,例如:

A = foreach file_one generate *, join_key1 as SUBSTRING(key1, 1, 100);
B = foreach file_two generate *, join_key2 as SUBSTRING(key2, 7, 100);
C = join A by join_key1, B by join_key2;