访问猪中的数组元素

时间:2012-11-06 15:09:55

标签: apache-pig

我的表格中有数据: ID,VAL1,val2的

例如

1,0.2,0.1
1,0.1,0.7
1,0.2,0.3
2,0.7,0.9
2,0.2,0.3
2,0.4,0.5

首先,我想按递减顺序对每个id按val1排序。所以像

那样
1,0.2,0.1
1,0.2,0.3
1,0.1,0.7
2,0.7,0.9
2,0.4,0.5
2,0.2,0.3

然后为每个id选择第二个元素id,val2组合 例如:

  1,0.3
  2,0.5

我该如何处理?

由于

2 个答案:

答案 0 :(得分:5)

Pig是一种脚本语言,而不是像SQL这样的关系语言,它非常适合与嵌套在FOREACH中的运算符的组一起工作。以下是解决方案:

A = LOAD 'input' USING PigStorage(',') AS (id:int, v1:float, v2:float);
B = GROUP A BY id; -- isolate all rows for the same id
C = FOREACH B { -- here comes the scripting bit
    elems = ORDER A BY v1 DESC; -- sort rows belonging to the id
    two = LIMIT elems 2; -- select top 2
    two_invers = ORDER two BY v1 ASC; -- sort in opposite order to bubble second value to the top
    second = LIMIT two_invers 1;
    GENERATE FLATTEN(group) as id, FLATTEN(second.v2);
};
DUMP C;

在你的例子中,id 1有两行,v1 == 0.2但不同的v2,因此id 1的第二个值可以是0.1或0.3

答案 1 :(得分:1)

A = LOAD 'input' USING PigStorage(',') AS (id:int, v1:int, v2:int);
B = ORDER A BY id ASC, v1 DESC;
C = FOREACH B GENERATE id, v2;
DUMP C;