MAX(计数)功能apache猪拉丁语

时间:2017-03-01 01:12:48

标签: apache hadoop apache-pig hadoop-streaming hadoop-partitioning

以下程序我试图在Apache Pig中使用它和非结构化数据

i)我有包含街道名称,城市和州的数据集:

ii)按州分组

iii)我在数据集中采用COUNT(*)状态现在我的o / p将像statename,count ===>数据集中该状态可用的时间

程序:

realestate = LOAD DATA using pigstorage(',') as (street:string,city string,state string);

A = GROUP realestate by state;
B= FOREACH A GENERATE group , count (*)

O / P就像

CA,14 华盛顿,20

现在我需要max(count)我的输出应该是"华盛顿,20)

如何处理它。请帮助我解决问题

1 个答案:

答案 0 :(得分:1)

在生成的结果

上应用ORDERLIMIT
realestate = LOAD DATA using pigstorage(',') as (street:string,city string,state string);
A = GROUP realestate by state;
B = FOREACH A GENERATE group , COUNT(realestate) as c;

# Arrange the tuples based on the count in descending order
D = order B by c desc;

# Apply limit on the ordered result to get the Max value
E = LIMIT D 1;