需要计算每个组中的用户数量,并且每个用户的组分配基于他们的付款(例如,付款少于2,或介于3到5之间,或超过5)。这是我目前的代码,想知道是否有任何方法可以使它更优雅?是否可以通过一组语句完成逻辑?感谢。
customer_group = group payments_feed by customerID;
customer_payment_count=foreach customer_group generate customerID, COUNT(payments_feed) as payment_amount;
tier1 = filter customer_payment_count by payment_amount <= 2;
tier2 = filter customer_payment_count by 3 <= payment_amount <= 5;
tier3 = filter customer_payment_count by payment_amount > 5;
tier1_group = group tier1 by all;
tier1_count = foreach tier1_group generate COUNT_STAR(tier1);
tier2_group = group tier2 by all;
tier2_count = foreach tier2_group generate COUNT_STAR(tier2);
tier3_group = group tier3 by all;
tier3_count = foreach tier3_group generate COUNT_STAR(tier3);
result = UNION tier1_count, tier2_count, tier3_count;
一些虚假数据,架构客户ID(唯一),付款(价值始终为1,因为它代表客户付款的时间),
1 1
2 1
1 1
3 1
4 1
1 1
2 1
1 1
1 1
5 1
在这种情况下,客户1进行了5次付款,应该是第2级,而所有其他客户付款不超过2次,它们都属于第1层。
所以,期望的输出是,
4 1 0
提前谢谢,
林
答案 0 :(得分:1)
没有。 GROUP 足以计算相同的内容。因为要计算特定包中的元组数,所以使用了额外的 GROUP 。但不是使用三个 FILTER 。您可以使用 SPLIT ,代码如下:
customer_group = group payments_feed by customerID;
customer_payment_count=foreach customer_group generate customerID, COUNT(payments_feed) as payment_amount;
split into customer_payment_count into tier1 if(coun<=2), tier2 if(coun>=3 AND coun<=5),tier3 if(coun>5);
tier1_group = group tier1 by all;
tier1_count = foreach tier1_group generate COUNT_STAR(tier1);
tier2_group = group tier2 by all;
tier2_count = foreach tier2_group generate COUNT_STAR(tier2);
tier3_group = group tier3 by all;
tier3_count = foreach tier3_group generate COUNT_STAR(tier3);
result = UNION tier1_count, tier2_count, tier3_count;