运行自定义PIG加载功能时出错

时间:2015-12-12 06:44:29

标签: java hadoop apache-pig

我是Pig和Java的新手,我正在尝试编写一个自定义的Load函数,它只能从我的数据集中提供特定的行。不幸的是,我收到一个我无法理解的错误。有人可以帮助我理解这个错误。如果您需要我方提供的任何其他数据,请与我们联系。非常感谢。

My version of Pig is 0.13.0
My Linux version is Ubuntu 14.04.1

我的代码:

package udf;

import java.io.IOException;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.InputFormat;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.pig.LoadFunc;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;

        public class extract extends LoadFunc{

            private RecordReader reader;
            private int count = 0;
            private TupleFactory mytuplefactory;

            public extract(){
                mytuplefactory = TupleFactory.getInstance();
            }

            @Override
            public InputFormat getInputFormat() throws IOException {

                return new TextInputFormat();
            }

            @Override
            public Tuple getNext() throws IOException {


                        Tuple myTuple = null;
                        Text myText = null;

                try{

                    boolean notdone = reader.nextKeyValue();

                    if(!notdone){
                        return null;
                    }

                    while( reader.nextKeyValue()){
                        Text t = (Text) reader.getCurrentValue();
                        if(t.toString().equals("People aged 18-64"))
                            count = 6;
                        if(count <= 6 && count > 0){
                            myText = (Text) reader.getCurrentValue();
                            break;
                        }
                        count--;
                    }

                    if(myText != null){

                        myTuple = mytuplefactory.newTuple(myText);
                        return myTuple;

                    }else
                        return null;

                }catch(Exception e){
                    throw new ExecException(e);
                }
            }

            @Override
            public void setLocation(String location, Job job) throws IOException {

                FileInputFormat.setInputPaths(job, location);

            }

            @Override
            public void prepareToRead(RecordReader reader, PigSplit PigSplit)
                    throws IOException {

                this.reader = reader;
            }
        }

我的样本测试数据:

All People  "281,113"   *****   "100,094"   946 "84,028"    769 "63,736"    721 "23,176"    379 "10,079"    247 
  .Insured (any type)   85.25   0.23    87.10   0.34    85.08   0.42    82.49   0.38    84.41   0.67    87.72   0.66    
    ..Privately insured (alone or  in combination)  73.46   0.29    79.43   0.41    76.51   0.46    69.31   0.46    57.98   0.81    50.45   1.21    
    ..Medicare  12.98   0.06    3.48    0.11    8.34    0.22    19.05   0.31    37.32   0.72    51.50   1.04    
    ..Medicaid  11.44   0.22    9.08    0.33    9.07    0.31    11.99   0.33    21.32   0.63    28.37   1.05    
    ..Publically insured (no private)   11.79   0.20    7.67    0.30    8.57    0.28    13.17   0.30    26.43   0.76    37.27   1.21    
  .Uninsured    14.75   0.23    12.90   0.34    14.92   0.42    17.51   0.38    15.59   0.67    12.28   0.66    
People aged 0-64    "247,371"   *****   "96,563"    913 "77,129"    736 "52,013"    653 "15,666"    322 "6,000" 179 
  .Insured (any type)   83.36   0.26    86.68   0.35    83.81   0.45    78.73   0.45    77.41   0.92    79.97   1.03    
    ..Privately insured (alone or  in combination)  73.11   0.30    79.24   0.42    75.86   0.49    67.21   0.53    53.17   0.95    42.27   1.51    
    ..Medicare  1.83    0.06    0.19    0.03    0.52    0.06    1.90    0.12    10.29   0.50    22.32   1.05    
    ..Medicaid  11.69   0.25    9.24    0.34    9.47    0.34    13.10   0.42    24.25   0.80    34.81   1.48    
    ..Publically insured (no private)   10.25   0.21    7.44    0.30    7.94    0.29    11.52   0.33    24.24   0.83    37.70   1.43    
  .Uninsured    16.64   0.26    13.32   0.35    16.19   0.45    21.27   0.45    22.59   0.92    20.03   1.03    
People aged 18-64   "174,712"   *****   "55,534"    607 "57,342"    553 "41,977"    517 "14,113"    311 "5,747" 168 
  .Insured (any type)   82.54   0.26    86.33   0.37    84.00   0.40    77.95   0.48    76.63   0.97    79.45   1.08    
    ..Privately insured (alone or  in combination)  75.65   0.30    83.85   0.37    80.26   0.41    70.40   0.52    54.03   0.99    41.89   1.52    
    ..Medicare  2.59    0.08    0.32    0.05    0.70    0.08    2.35    0.15    11.42   0.56    23.30   1.06    
    ..Medicaid  7.02    0.19    2.91    0.18    4.16    0.21    7.85    0.32    21.51   0.79    33.75   1.53    
    ..Publically insured (no private)   6.89    0.17    2.48    0.15    3.74    0.18    7.55    0.28    22.59   0.82    37.56   1.44    
  .Uninsured    17.46   0.26    13.67   0.37    16.00   0.40    22.05   0.48    23.37   0.97    20.55   1.08    
People aged 65 and older    "33,742"    *****   "3,531" 117 "6,899" 186 "11,723"    211 "7,510" 175 "4,079" 138 
  .Insured (any type)   99.08   0.13    98.51   0.43    99.34   0.20    99.14   0.22    99.01   0.22    99.12   0.28    
    ..Privately insured (alone or  in combination)  75.99   0.60    84.51   1.39    83.80   0.94    78.64   0.89    68.02   1.24    62.48   1.79    
    ..Medicare  94.70   0.29    93.50   0.96    95.77   0.44    95.15   0.44    93.70   0.67    94.43   0.74    
    ..Medicaid  9.55    0.38    4.71    0.70    4.57    0.52    7.05    0.57    15.21   0.84    18.90   1.18    
    ..Publically insured (no private)   23.09   0.60    14.00   1.32    15.54   0.93    20.50   0.86    30.99   1.24    36.64   1.81    
  .Uninsured    0.92    0.13    1.49    0.43    0.66    0.20    0.86    0.22    0.99    0.22    0.88    0.28    
Family income less than 200 percent of poverty threshold\2  "94,310"    907 "27,954"    600 "25,543"    498 "23,801"    432 "11,392"    283 "5,621" 197 
  .Insured (any type)   73.93   0.44    75.44   0.74    71.27   0.93    70.74   0.72    77.58   1.04    84.70   1.01    
    ..Privately insured (alone or  in combination)  47.15   0.52    54.03   0.98    49.48   0.91    44.50   0.76    36.84   1.05    34.52   1.36    
    ..Medicare  15.41   0.24    3.51    0.24    8.72    0.44    18.96   0.56    36.14   1.09    47.98   1.34    
    ..Medicaid  26.38   0.50    24.69   0.89    23.25   0.80    24.73   0.77    34.17   1.02    40.24   1.51    
    ..Publically insured (no private)   26.78   0.48    21.41   0.86    21.79   0.80    26.23   0.64    40.74   1.17    50.19   1.52    
  .Uninsured    26.07   0.44    24.56   0.74    28.73   0.93    29.26   0.72    22.42   1.04    15.30   1.01    
Family income greater than or equal to 200 percent of poverty threshold\2   "186,434"   909 "71,968"    749 "58,398"    686 "39,852"    557 "11,778"    265 "4,438" 150 
  .Insured (any type)   91.00   0.18    91.65   0.30    91.15   0.30    89.53   0.38    91.01   0.56    91.76   0.84    
    ..Privately insured (alone or  in combination)  86.85   0.23    89.38   0.33    88.42   0.33    84.24   0.45    78.46   0.82    70.86   1.47    
    ..Medicare  11.77   0.13    3.47    0.14    8.19    0.23    19.15   0.39    38.47   0.92    56.21   1.57    
    ..Medicaid  3.80    0.15    2.93    0.18    2.81    0.19    4.28    0.27    8.86    0.60    13.28   1.11    
    ..Publically insured (no private)   4.15    0.13    2.27    0.14    2.73    0.15    5.29    0.30    12.55   0.74    20.90   1.40    
  .Uninsured    9.00    0.18    8.35    0.30    8.85    0.30    10.47   0.38    8.99    0.56    8.24    0.84    

错误讯息:

2015-12-11 22:21:21,360 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2015-12-11 22:21:21,364 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2015-12-11 22:21:21,496 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2015-12-11 22:21:21,542 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-12-11 22:21:21,556 [main] ERROR org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad - Received error from loader function: org.apache.pig.backend.executionengine.ExecException: ERROR 0: java.lang.NullPointerException
2015-12-11 22:21:21,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2088: Fetch failed. Couldn't retrieve result
Details at logfile: /home/hadoop/ashwin/data/pig_1449899080960.log

0 个答案:

没有答案