我的表格中有数据: ID,VAL1,val2的
例如
1,0.2,0.1
1,0.1,0.7
1,0.2,0.3
2,0.7,0.9
2,0.2,0.3
2,0.4,0.5
首先,我想按递减顺序对每个id按val1排序。所以像
那样1,0.2,0.1
1,0.2,0.3
1,0.1,0.7
2,0.7,0.9
2,0.4,0.5
2,0.2,0.3
然后为每个id选择第二个元素id,val2组合 例如:
1,0.3
2,0.5
我该如何处理?
由于
答案 0 :(得分:5)
Pig是一种脚本语言,而不是像SQL这样的关系语言,它非常适合与嵌套在FOREACH中的运算符的组一起工作。以下是解决方案:
A = LOAD 'input' USING PigStorage(',') AS (id:int, v1:float, v2:float);
B = GROUP A BY id; -- isolate all rows for the same id
C = FOREACH B { -- here comes the scripting bit
elems = ORDER A BY v1 DESC; -- sort rows belonging to the id
two = LIMIT elems 2; -- select top 2
two_invers = ORDER two BY v1 ASC; -- sort in opposite order to bubble second value to the top
second = LIMIT two_invers 1;
GENERATE FLATTEN(group) as id, FLATTEN(second.v2);
};
DUMP C;
在你的例子中,id 1有两行,v1 == 0.2但不同的v2,因此id 1的第二个值可以是0.1或0.3
答案 1 :(得分:1)
A = LOAD 'input' USING PigStorage(',') AS (id:int, v1:int, v2:int);
B = ORDER A BY id ASC, v1 DESC;
C = FOREACH B GENERATE id, v2;
DUMP C;