我希望能够在这里按小时分组,我知道我将有多个小时条目归档。例如,如下所示的第11个小时将出现多次。我该怎么做?
hour,windSpeed
11, 3.6
2 , 6.8
11, 2.5
13, 5.0
14, 8.9
11, 3.2
所以我有这个,我只想按小时分组
例如,
我们想要{11: 3.6, 2.5, 3.2 }
和remanings,因为只有一个值会分组到它自己的
{14: 8.9}
{2: 6.8}
answer = FOREACH weather_data GENERATE $0 AS hour, $1 as speed
答案 0 :(得分:1)
按小时分组
A = FOREACH weather_data GENERATE $0 AS hour, $1 as speed;
B = GROUP A by hour;
DUMP B;
如果要聚合,请使用sum
C = FOREACH B generate group as hour,SUM(A.speed) as Total;
DUMP C;
答案 1 :(得分:1)
试试这个。
A = LOAD 'data' AS (Hour:chararray, windSpeed:chararray);
B = GROUP A BY (Hour);
C = FOREACH B GENERATE
FLATTEN(group) AS (Hour), A.windSpeed
;
注意:这是一个未经测试的代码