如何计算PIG中的重复值

时间:2016-02-23 01:12:09

标签: count apache-pig

我试图计算重复的成员数并计算> 1在带有id列表的文件中。我运行了下面但得到1值,我认为只是计算memberid列中的行数:

ids = load 'ids';
ids = filter ids by id;
group = group ids ALL;
count = foreach group generate count (ids);
dump count;

1 个答案:

答案 0 :(得分:0)

我假设文件是​​制表符分隔的。

A = LOAD '/test.txt' USING PigStorage('\t') AS (id:int,create_dt:chararray);
B = FILTER A BY (id > 1 and DaysBetween(CurrentTime(),ToDate(create_dt)) == 30);
C = GROUP B BY id;
D = FOREACH C GENERATE group as id,COUNT(B) as totalcount;
DUMP D;