我试图计算重复的成员数并计算> 1在带有id列表的文件中。我运行了下面但得到1值,我认为只是计算memberid列中的行数:
ids = load 'ids';
ids = filter ids by id;
group = group ids ALL;
count = foreach group generate count (ids);
dump count;
答案 0 :(得分:0)
我假设文件是制表符分隔的。
A = LOAD '/test.txt' USING PigStorage('\t') AS (id:int,create_dt:chararray);
B = FILTER A BY (id > 1 and DaysBetween(CurrentTime(),ToDate(create_dt)) == 30);
C = GROUP B BY id;
D = FOREACH C GENERATE group as id,COUNT(B) as totalcount;
DUMP D;