以下程序我试图在Apache Pig中使用它和非结构化数据
i)我有包含街道名称,城市和州的数据集:
ii)按州分组
iii)我在数据集中采用COUNT(*)状态现在我的o / p将像statename,count ===>数据集中该状态可用的时间
程序:
realestate = LOAD DATA using pigstorage(',') as (street:string,city string,state string);
A = GROUP realestate by state;
B= FOREACH A GENERATE group , count (*)
O / P就像
CA,14 华盛顿,20
现在我需要max(count)我的输出应该是"华盛顿,20)
如何处理它。请帮助我解决问题
答案 0 :(得分:1)
在生成的结果
上应用ORDER
和LIMIT
realestate = LOAD DATA using pigstorage(',') as (street:string,city string,state string);
A = GROUP realestate by state;
B = FOREACH A GENERATE group , COUNT(realestate) as c;
# Arrange the tuples based on the count in descending order
D = order B by c desc;
# Apply limit on the ordered result to get the Max value
E = LIMIT D 1;