我正在尝试编写一个Java UDF,它将使用java UDF将元组排在一个包中。 元组具有作为排名标准的值列和最初设置为0的排名列。 元组基于值列进行排序。 所有的元组都放在一个袋子里,那个袋子放在一个新的元组里面,然后传递给UDF。
然而,UDF正在修改秩列 - 一旦方法退出,则值全部再次变为0。我不确定如何将值设为“Stick”。
任何帮助都会非常感激。
这是我的java类
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.pig.FilterFunc;
import org.apache.pig.EvalFunc;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.DataBag;
import org.apache.pig.impl.logicalLayer.FrontendException;
import java.util.Iterator;
import org.apache.pig.PigWarning;
/**
*
* @author Winter
*/
public class Ranker extends EvalFunc<String>{
@Override
public String exec(Tuple tuple) throws IOException {
if (tuple == null || tuple.size() == 0) {
return null;
}
List<Object> list = tuple.getAll();
DataBag db = (DataBag) list.get(0);
Integer num = (Integer)list.get(1);
Iterator<Tuple>itr = db.iterator();
boolean containsNonNull = false;
int i = 1;
double previous=0;
while (itr.hasNext()) {
Tuple t= itr.next();
double d = (Double)t.get(num.intValue());
int rankCol = t.size()-1;
Integer rankVal = (Integer)t.get(rankCol);
if(i == 0){
System.out.println("i==0");
previous = d;
t.set(rankCol, i);
} else {
if(d == previous)
t.set(rankCol, i);
else{
System.out.print("d!==previous|" + d + "|"+ previous+"|"+rankVal);
t.set(rankCol, ++i);
rankVal = (Integer)t.get(rankCol);
System.out.println("|now rank val" + rankVal);
previous = d;
}
}
}
return "Y";
}
}
以下是我在Pig中调用所有内容的方法 -
REGISTER /myJar.jar;
A = LOAD '/Users/Winter/milk-tea-coffee.tsv' as (year:chararray, milk:double);
B = foreach A generate year, milk, 0 as rank;
C = order B by milk asc;
D = group C by rank order C by milk;
E = foreach D generate D.C.year,D.C.milk,D.C.rank, piglet3.evalFunctions.Ranker(D.C,1);
dump E;
由于UDF中的print语句,我可以告诉它在UDF中的工作 - d!== previous | 21.2 | 0.0 | 0 | now rank val2 d!==上一页| 21.6 | 21.2 | 0 |现在排名val3 d!==上一页| 21.9 | 21.6 | 0 |现在排名val4 d!==上一页| 22.0 | 21.9 | 0 |现在排名val5 d!==上一页| 22.5 | 22.0 | 0 |现在排名val6 d!==上一页| 22.9 | 22.5 | 0 |现在排名val7 d!==上一页| 23.0 | 22.9 | 0 |现在排名val8 d!==上一页| 23.4 | 23.0 | 0 |现在排名val9 d!==上一页| 23.8 | 23.4 | 0 |现在排名val10 d!==上一页| 23.9 | 23.8 | 0 |现在排名val11
但是当我转出E或D或C时,rank列只包含0。
答案 0 :(得分:1)
exec函数必须从UDF返回所需的输出。您当前正在修改传递给exec函数的元组,然后返回字符串“Y” - Pig看到的所有内容都是UDF的输出为“Y”。在这种情况下,您应该返回元组而不是“Y”。
我认为以下代码与你的意图很接近,但我不清楚你要做什么:
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.pig.FilterFunc;
import org.apache.pig.EvalFunc;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.DataBag;
import org.apache.pig.impl.logicalLayer.FrontendException;
import java.util.Iterator;
import org.apache.pig.PigWarning;
/**
*
* @author Winter
*/
public class Ranker extends EvalFunc<Tuple>{
@Override
public Tuple exec(Tuple tuple) throws IOException {
if (tuple == null || tuple.size() == 0) {
return null;
}
List<Object> list = tuple.getAll();
DataBag db = (DataBag) list.get(0);
Integer num = (Integer)list.get(1);
Iterator<Tuple>itr = db.iterator();
boolean containsNonNull = false;
int i = 1;
double previous=0;
while (itr.hasNext()) {
Tuple t= itr.next();
double d = (Double)t.get(num.intValue());
int rankCol = t.size()-1;
Integer rankVal = (Integer)t.get(rankCol);
if(i == 0){
System.out.println("i==0");
previous = d;
t.set(rankCol, i);
} else {
if(d == previous)
t.set(rankCol, i);
else{
System.out.print("d!==previous|" + d + "|"+ previous+"|"+rankVal);
t.set(rankCol, ++i);
rankVal = (Integer)t.get(rankCol);
System.out.println("|now rank val" + rankVal);
previous = d;
}
}
}
return tuple;
}
}