我的数据是这样的:
(201601030637,2,64.001213)
(201601030756,3,63.5869656667)
(201601040220,2,62.758471)
其中第一列是年(2016)月(01)日(03)小时(06)和分钟(37)相互连接。
我想根据星期对第三列的值求和。如何将他们分组为全年有52个不同的小组?有人可以帮忙吗? 谢谢!
答案 0 :(得分:0)
使用GetWeek并从第一列创建一个新列。然后按新列分组并使用SUM。假设您已将数据加载到关系A。
B = FOREACH A GENERATE A.$0,A.$1,A.$2,GetWeek(A.$0) as week_of_year;
C = GROUP B BY (B.$4);
D = FOREACH C GENERATE group,SUM(B.$2);
DUMP D;
答案 1 :(得分:0)
使用ToDate
将日期字符串转换为日期时间类型。然后使用GetWeek
获取周数。最后使用GROUP
按周和SUM
进行分组。
A = LOAD '/path_to_data/data' USING PigStorage(',') as (c1: chararray, c2: int, c3: float);
B = FOREACH A GENERATE GetWeek(ToDate(c1,'yyyyMMddHHmm')) as weeknum, c1, c2, c3;
C = FOREACH (GROUP B BY weeknum) GENERATE group as weeknum, SUM(B.c2) as c2_sum;
DUMP C;