怎么做猪的选择性加入

时间:2013-10-31 16:42:15

标签: apache-pig

我有两个数据集.. main_data.txt

{"id":"foo", "some_field:12354, "score":0}
{"id":"foobar", "some_field:12354, "score":0}

score_data.txt

{"id":"foo", "score":1}
{"id":"foobar","score":20}

...

所以在main_data中得分初始化为0 .. 另外.. main_data和score_data有一些共同的ID ..

对于常见的ID: 我想将main_data中的“得分”替换为score_data

中的得分

如果该元素缺席..那么我想让分数为0本身..

1 个答案:

答案 0 :(得分:1)

为什么将“得分”初始化为0?你可以简单地跳过它,加入main_data(LEFT OUTER)和score_data。无论你是否跳过,这都应该有效:

main_data = LOAD USING SOME STORAGE; -- asume we have id as column
score_data = LOAD USING SOME STORAGE; -- asume we have id, score as columns
joined_data = JOIN main_data BY main_data::id LEFT OUTER, score_data BY score_data::id;
results = FOREACH joined_data GENERATE main_data::id, (score_data::score IS NULL ? 0 : score_data::score);
STORE results USING SOMETHING SOMEWHERE;