我试图通过他们的个人资料中至少有两个国家或者来自美国的那些人过滤用户,我在Pig中试过这个
B = group A by userid;
C = foreach B {
count = $1.country;
count2 = distinct count;
GENERATE (((SIZE(count2) > 1 OR count2.$0 != 'USA') ? group : null)));
}
但它出现了这个错误
incompatible types in NotEqual Operator left hand side:bag :tuple(country:chararray) right hand side:chararray
我尝试了其他各种组合,但没有运气。
答案 0 :(得分:2)
试试这个:
C =
foreach (group A by userid)
generate
group as userid,
COUNT(A) AS count,
FLATTEN(A) as country;
D = filter C by count > 1 OR country == 'US';
C是与架构{userid:chararray,count:long,country:chararray}的关系,其中count是与userid相关联的国家/地区的数量。 D根据您的标准进行过滤。