如何计算PIG的唯一用户数

时间:2013-02-06 11:32:32

标签: hadoop apache-pig

以下代码不能完全返回我想要计算的内容;唯一身份用户的数量。有什么想法吗?

data = LOAD 'input_initial' AS (user_id,item_id,rating,timestamp);
data = FOREACH data GENERATE user_id,item_id;
STORE data INTO 'input_final';
data_users = FOREACH data GENERATE user_id;
group_users = GROUP data_users BY user_id;
count_users = FOREACH group_users GENERATE COUNT(data_users);
STORE count_users INTO 'count_users';

1 个答案:

答案 0 :(得分:3)

您需要修改最终的GROUP操作以对“所有”进行操作,而不是单个字段:

group_users = GROUP data_users BY user_id;
grp_all = GROUP group_users ALL;
count_users = FOREACH grp_all GENERATE COUNT(group_users);