Pig Latin - for循环中的独特计数和字符串比较

时间:2012-11-20 17:55:57

标签: apache-pig

我试图通过他们的个人资料中至少有两个国家或者来自美国的那些人过滤用户,我在Pig中试过这个

    B = group A by userid;
    C = foreach B  {
                count = $1.country;
                count2 = distinct count;
                GENERATE (((SIZE(count2) > 1 OR count2.$0 != 'USA') ? group : null)));
        }

但它出现了这个错误

incompatible types in NotEqual Operator left hand side:bag :tuple(country:chararray)  right hand side:chararray

我尝试了其他各种组合,但没有运气。

1 个答案:

答案 0 :(得分:2)

试试这个:

C =
    foreach (group A by userid)
    generate
        group as userid,
        COUNT(A) AS count,
        FLATTEN(A) as country;
D = filter C by count > 1 OR country == 'US';

C是与架构{userid:chararray,count:long,country:chararray}的关系,其中count是与userid相关联的国家/地区的数量。 D根据您的标准进行过滤。