关于PIG的查询 - 如何在ForEach中添加if条件

时间:2015-10-16 05:25:39

标签: hadoop apache-pig

我有一个写猪脚本的查询

 RESULT_SOMETYPE = FOREACH SOMETYPE_DATA_GROUPED  GENERATE flatten(group) , SUM(SOMETYPEDATA.DURATION) as duration, COUNT(SOMETYPEDATA.DURATION) as cnt;

在这里,我想将 SUM(SOMETYPEDATA.DURATION)替换为某些数字,例如

if(0>Sum > 1000) then put 1
if(1001> Sum > 2000 )  then put 2
if(2001> Sum > 3000 )  then put 3

如何在猪身上实现这一目标

请建议

2 个答案:

答案 0 :(得分:0)

SPLIT会这样做,但不会在FOREACH循环中。 Pig也有ternary运算符类型的东西但是将结果存储在变量中没有用。以下是如何使用SPLIT实现接近您要求的内容。

A = LOAD '/home/vignesh/a.dat' using PigStorage(',') as (a:int,b:int,c:int);
SPLIT A INTO B IF (a > 0 AND a < 1000),  C IF (a > 1001 AND a<2000), D IF (a > 2001 AND a < 3000);

答案 1 :(得分:0)

我们可以使用二进制运算符(?:)或CASE语句(来自Pig Version:0.12 on wards)来实现目标。

RESULT_SOMETYPE = FOREACH SOMETYPE_DATA_GROUPED  GENERATE flatten(group) AS grp_name , SUM(SOMETYPEDATA.DURATION) as duration_sum, COUNT(SOMETYPEDATA.DURATION) as cnt;

result_required = FOREACH RESULT_SOMETYPE GENEATE grp_name, 
                        (duration_sum > 0 AND duration_sum < 1000 ? 1 : 
                                        (duration_sum > 1001 AND duration_sum < 2000 ? 2 : 
                                                (duration_sum > 2001 AND duration_sum < 3000 ? 3 : 9999)     
                                        )
                         ) AS duration, cnt;

参考:http://pig.apache.org/docs/r0.12.0/basic.html#arithmetic