我有这样的数据:
1, 0, 0
0, 1, 0
0, 0, 1
需要输出:
1, 1, 1
猪怎么办?
答案 0 :(得分:0)
输入
1, 0, 0
0, 1, 0
0, 0, 1
只需在每行中创建一个具有相同值的新变量,并使用该键应用分组,并为每个变量选择MAX ..
records = LOAD '/user/cloudera/records.txt' USING PigStorage(',') AS (c1:int,c2:int,c3:int);
records_each = FOREACH records GENERATE 'KEY' as grouping_key, c1, c2, c3;
records_grp = GROUP records_each BY grouping_key;
records_grp_each = FOREACH records_grp GENERATE MAX(records_each.c1) as c1, MAX(records_each.c2) as c2, MAX(records_each.c3) as c3;
输出:
(1,1,1)