PIG - 如何按字段分组,具有多个条目

时间:2016-04-27 15:58:55

标签: sql hadoop apache-pig

我希望能够在这里按小时分组,我知道我将有多个小时条目归档。例如,如下所示的第11个小时将出现多次。我该怎么做?

hour,windSpeed
11, 3.6
2 , 6.8
11, 2.5
13, 5.0
14, 8.9
11, 3.2

所以我有这个,我只想按小时分组

例如,

我们想要{11: 3.6, 2.5, 3.2 }

和remanings,因为只有一个值会分组到它自己的

{14: 8.9}

{2: 6.8}

answer = FOREACH weather_data GENERATE $0 AS hour, $1 as speed

2 个答案:

答案 0 :(得分:1)

按小时分组

A = FOREACH weather_data GENERATE $0 AS hour, $1 as speed;
B = GROUP A by hour;
DUMP B;

如果要聚合,请使用sum

C = FOREACH B generate group as hour,SUM(A.speed) as Total;
DUMP C;

答案 1 :(得分:1)

试试这个。

A = LOAD 'data' AS (Hour:chararray, windSpeed:chararray);
B = GROUP A BY (Hour);
C = FOREACH B GENERATE
FLATTEN(group) AS (Hour), A.windSpeed
;

注意:这是一个未经测试的代码