我有一个写猪脚本的查询
RESULT_SOMETYPE = FOREACH SOMETYPE_DATA_GROUPED GENERATE flatten(group) , SUM(SOMETYPEDATA.DURATION) as duration, COUNT(SOMETYPEDATA.DURATION) as cnt;
在这里,我想将 SUM(SOMETYPEDATA.DURATION)替换为某些数字,例如
if(0>Sum > 1000) then put 1
if(1001> Sum > 2000 ) then put 2
if(2001> Sum > 3000 ) then put 3
如何在猪身上实现这一目标
请建议
答案 0 :(得分:0)
SPLIT
会这样做,但不会在FOREACH
循环中。 Pig也有ternary运算符类型的东西但是将结果存储在变量中没有用。以下是如何使用SPLIT实现接近您要求的内容。
A = LOAD '/home/vignesh/a.dat' using PigStorage(',') as (a:int,b:int,c:int);
SPLIT A INTO B IF (a > 0 AND a < 1000), C IF (a > 1001 AND a<2000), D IF (a > 2001 AND a < 3000);
答案 1 :(得分:0)
我们可以使用二进制运算符(?:)或CASE语句(来自Pig Version:0.12 on wards)来实现目标。
RESULT_SOMETYPE = FOREACH SOMETYPE_DATA_GROUPED GENERATE flatten(group) AS grp_name , SUM(SOMETYPEDATA.DURATION) as duration_sum, COUNT(SOMETYPEDATA.DURATION) as cnt;
result_required = FOREACH RESULT_SOMETYPE GENEATE grp_name,
(duration_sum > 0 AND duration_sum < 1000 ? 1 :
(duration_sum > 1001 AND duration_sum < 2000 ? 2 :
(duration_sum > 2001 AND duration_sum < 3000 ? 3 : 9999)
)
) AS duration, cnt;