我写了以下猪脚本。我怎样才能把它作为嵌套的呢?
input= LOAD '/path/to/input/data' USING PigStorage('\t') AS (id:chararray,category:chararray);
grp= GROUP input BY category;
grp_count= FOREACH grp generate group, COUNT(input);
grp_ordered= order grp_count by $1 DESC;
top_grp= LIMIT grp_ordered 5;
答案 0 :(得分:1)
非常简单 - 看看grp_count关系:
input= LOAD '/path/to/input/data' USING PigStorage('\t') AS (id:chararray,category:chararray);
grp_count= FOREACH (GROUP input BY category)
generate flatten(group) as category
,COUNT(input) as cnt;
grp_ordered= order grp_count by $1 DESC;
top_grp= LIMIT grp_ordered 5;
答案 1 :(得分:0)
如果我理解你的问题,下面是代码。
data = LOAD 'data' USING PigStorage() AS (id,category);
grp = GROUP data BY category;
grp_count = FOREACH grp {
ord = order data by $1 DESC ;
top_grp = LIMIT ord 5;
GENERATE flatten(group),COUNT(top_grp.$1) ; };
dump grp_count;