我是Pig和Java的新手,我正在尝试编写一个自定义的Load函数,它只能从我的数据集中提供特定的行。不幸的是,我收到一个我无法理解的错误。有人可以帮助我理解这个错误。如果您需要我方提供的任何其他数据,请与我们联系。非常感谢。
My version of Pig is 0.13.0
My Linux version is Ubuntu 14.04.1
我的代码:
package udf;
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.InputFormat;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.pig.LoadFunc;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
public class extract extends LoadFunc{
private RecordReader reader;
private int count = 0;
private TupleFactory mytuplefactory;
public extract(){
mytuplefactory = TupleFactory.getInstance();
}
@Override
public InputFormat getInputFormat() throws IOException {
return new TextInputFormat();
}
@Override
public Tuple getNext() throws IOException {
Tuple myTuple = null;
Text myText = null;
try{
boolean notdone = reader.nextKeyValue();
if(!notdone){
return null;
}
while( reader.nextKeyValue()){
Text t = (Text) reader.getCurrentValue();
if(t.toString().equals("People aged 18-64"))
count = 6;
if(count <= 6 && count > 0){
myText = (Text) reader.getCurrentValue();
break;
}
count--;
}
if(myText != null){
myTuple = mytuplefactory.newTuple(myText);
return myTuple;
}else
return null;
}catch(Exception e){
throw new ExecException(e);
}
}
@Override
public void setLocation(String location, Job job) throws IOException {
FileInputFormat.setInputPaths(job, location);
}
@Override
public void prepareToRead(RecordReader reader, PigSplit PigSplit)
throws IOException {
this.reader = reader;
}
}
我的样本测试数据:
All People "281,113" ***** "100,094" 946 "84,028" 769 "63,736" 721 "23,176" 379 "10,079" 247
.Insured (any type) 85.25 0.23 87.10 0.34 85.08 0.42 82.49 0.38 84.41 0.67 87.72 0.66
..Privately insured (alone or in combination) 73.46 0.29 79.43 0.41 76.51 0.46 69.31 0.46 57.98 0.81 50.45 1.21
..Medicare 12.98 0.06 3.48 0.11 8.34 0.22 19.05 0.31 37.32 0.72 51.50 1.04
..Medicaid 11.44 0.22 9.08 0.33 9.07 0.31 11.99 0.33 21.32 0.63 28.37 1.05
..Publically insured (no private) 11.79 0.20 7.67 0.30 8.57 0.28 13.17 0.30 26.43 0.76 37.27 1.21
.Uninsured 14.75 0.23 12.90 0.34 14.92 0.42 17.51 0.38 15.59 0.67 12.28 0.66
People aged 0-64 "247,371" ***** "96,563" 913 "77,129" 736 "52,013" 653 "15,666" 322 "6,000" 179
.Insured (any type) 83.36 0.26 86.68 0.35 83.81 0.45 78.73 0.45 77.41 0.92 79.97 1.03
..Privately insured (alone or in combination) 73.11 0.30 79.24 0.42 75.86 0.49 67.21 0.53 53.17 0.95 42.27 1.51
..Medicare 1.83 0.06 0.19 0.03 0.52 0.06 1.90 0.12 10.29 0.50 22.32 1.05
..Medicaid 11.69 0.25 9.24 0.34 9.47 0.34 13.10 0.42 24.25 0.80 34.81 1.48
..Publically insured (no private) 10.25 0.21 7.44 0.30 7.94 0.29 11.52 0.33 24.24 0.83 37.70 1.43
.Uninsured 16.64 0.26 13.32 0.35 16.19 0.45 21.27 0.45 22.59 0.92 20.03 1.03
People aged 18-64 "174,712" ***** "55,534" 607 "57,342" 553 "41,977" 517 "14,113" 311 "5,747" 168
.Insured (any type) 82.54 0.26 86.33 0.37 84.00 0.40 77.95 0.48 76.63 0.97 79.45 1.08
..Privately insured (alone or in combination) 75.65 0.30 83.85 0.37 80.26 0.41 70.40 0.52 54.03 0.99 41.89 1.52
..Medicare 2.59 0.08 0.32 0.05 0.70 0.08 2.35 0.15 11.42 0.56 23.30 1.06
..Medicaid 7.02 0.19 2.91 0.18 4.16 0.21 7.85 0.32 21.51 0.79 33.75 1.53
..Publically insured (no private) 6.89 0.17 2.48 0.15 3.74 0.18 7.55 0.28 22.59 0.82 37.56 1.44
.Uninsured 17.46 0.26 13.67 0.37 16.00 0.40 22.05 0.48 23.37 0.97 20.55 1.08
People aged 65 and older "33,742" ***** "3,531" 117 "6,899" 186 "11,723" 211 "7,510" 175 "4,079" 138
.Insured (any type) 99.08 0.13 98.51 0.43 99.34 0.20 99.14 0.22 99.01 0.22 99.12 0.28
..Privately insured (alone or in combination) 75.99 0.60 84.51 1.39 83.80 0.94 78.64 0.89 68.02 1.24 62.48 1.79
..Medicare 94.70 0.29 93.50 0.96 95.77 0.44 95.15 0.44 93.70 0.67 94.43 0.74
..Medicaid 9.55 0.38 4.71 0.70 4.57 0.52 7.05 0.57 15.21 0.84 18.90 1.18
..Publically insured (no private) 23.09 0.60 14.00 1.32 15.54 0.93 20.50 0.86 30.99 1.24 36.64 1.81
.Uninsured 0.92 0.13 1.49 0.43 0.66 0.20 0.86 0.22 0.99 0.22 0.88 0.28
Family income less than 200 percent of poverty threshold\2 "94,310" 907 "27,954" 600 "25,543" 498 "23,801" 432 "11,392" 283 "5,621" 197
.Insured (any type) 73.93 0.44 75.44 0.74 71.27 0.93 70.74 0.72 77.58 1.04 84.70 1.01
..Privately insured (alone or in combination) 47.15 0.52 54.03 0.98 49.48 0.91 44.50 0.76 36.84 1.05 34.52 1.36
..Medicare 15.41 0.24 3.51 0.24 8.72 0.44 18.96 0.56 36.14 1.09 47.98 1.34
..Medicaid 26.38 0.50 24.69 0.89 23.25 0.80 24.73 0.77 34.17 1.02 40.24 1.51
..Publically insured (no private) 26.78 0.48 21.41 0.86 21.79 0.80 26.23 0.64 40.74 1.17 50.19 1.52
.Uninsured 26.07 0.44 24.56 0.74 28.73 0.93 29.26 0.72 22.42 1.04 15.30 1.01
Family income greater than or equal to 200 percent of poverty threshold\2 "186,434" 909 "71,968" 749 "58,398" 686 "39,852" 557 "11,778" 265 "4,438" 150
.Insured (any type) 91.00 0.18 91.65 0.30 91.15 0.30 89.53 0.38 91.01 0.56 91.76 0.84
..Privately insured (alone or in combination) 86.85 0.23 89.38 0.33 88.42 0.33 84.24 0.45 78.46 0.82 70.86 1.47
..Medicare 11.77 0.13 3.47 0.14 8.19 0.23 19.15 0.39 38.47 0.92 56.21 1.57
..Medicaid 3.80 0.15 2.93 0.18 2.81 0.19 4.28 0.27 8.86 0.60 13.28 1.11
..Publically insured (no private) 4.15 0.13 2.27 0.14 2.73 0.15 5.29 0.30 12.55 0.74 20.90 1.40
.Uninsured 9.00 0.18 8.35 0.30 8.85 0.30 10.47 0.38 8.99 0.56 8.24 0.84
错误讯息:
2015-12-11 22:21:21,360 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2015-12-11 22:21:21,364 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2015-12-11 22:21:21,496 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2015-12-11 22:21:21,542 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-12-11 22:21:21,556 [main] ERROR org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad - Received error from loader function: org.apache.pig.backend.executionengine.ExecException: ERROR 0: java.lang.NullPointerException
2015-12-11 22:21:21,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2088: Fetch failed. Couldn't retrieve result
Details at logfile: /home/hadoop/ashwin/data/pig_1449899080960.log