猪 - 获得最大数量

时间:2016-04-20 19:13:42

标签: hadoop apache-pig

样本数据

DATE      WindDirection

1/1/2000  SW
1/2/2000  SW
1/3/2000  SW
1/4/2000  NW
1/5/2000  NW

下面的问题

每一天都是独一无二的,风向不是唯一的,所以现在我们正试图获得最常见风向的COUNT

我的查询是

weather_data = FOREACH Weather GENERATE $16 AS Date, $9 AS w_direction;
e = FOREACH weather_data 
            {
                unique_winds = DISTINCT weather_data.w_direction;
                GENERATE unique_winds, COUNT(unique_winds);
            }
dump e;

逻辑是找到DISTINCT WindDirections(有7个),然后按WindDirection分组并应用计数。

现在我想得到风的总数或方向数。

1 个答案:

答案 0 :(得分:2)

您将需要GROUP BY风向并获取计数。按照desc顺序输出计数并获得最顶行。

wd = FOREACH Weather GENERATE $9 AS w_direction;
gwd = GROUP wd BY w_direction;
cwd = FOREACH gwd GENERATE group as wd,COUNT(wd.$0);
owd = ORDER cwd BY $1 DESC;
mwd  = LIMIT owd 1;
DUMP mwd;