mapreduce任务:
file_one中的Key1是a1,a2,a3,a10,a11,a12; file_two中的key2是persona1,persona1,persona2,persona3,persona12,persona12,persona3,persona11,persona10。
Merge_file = JOIN file_one BY Key1,file_two by Key2 ??(如何写这个..)
由于第二个密钥有重复,这有关系吗?
感谢
答案 0 :(得分:0)
我的建议是为每个数据集创建一个新列并加入其中,例如:
A = foreach file_one generate *, join_key1 as SUBSTRING(key1, 1, 100);
B = foreach file_two generate *, join_key2 as SUBSTRING(key2, 7, 100);
C = join A by join_key1, B by join_key2;