不了解MapReduce NPE

时间:2014-03-03 00:17:16

标签: java generics hadoop mapreduce cloudera

以下是我收到的错误:

    14/02/28 02:52:43 INFO mapred.JobClient: Task Id : attempt_201402271927_0020_m_000001_2, Status : FAILED
java.lang.NullPointerException
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:843)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:85)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:584)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

我已将我的代码评论为基本上采用典型的LongWritable和Text然后我只输出一个常量IntWritable 1和一个空的天气类(自定义类):

以下是我的mapper类:

public class Map extends Mapper<LongWritable, Text, IntWritable, Weather> {

private IntWritable id = new IntWritable(1);
private Weather we = new Weather();

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    //String s;
    //String line = value.toString();

    //int start[] =   {0,18,31,42,53,64,74,84,88,103};
    //int end[] =     {6,22,33,44,55,66,76,86,93,108};

    //if(line.length() > 108) {
        // create the object to hold our data
        // getStuff()
        // parse the string

        // push the object onto our data structure
        context.write(id, we);
    //}
}

这是我的减速机:

public class Reduce extends Reducer<IntWritable, Weather, IntWritable, Text> {
    private Text text = new Text("one");
    private IntWritable one = new IntWritable(1);
    public void reduce(IntWritable key, Iterable<Weather> weather, Context context)
        throws IOException, InterruptedException {
        //for(Weather w : weather) {
        //    text.set(w.toString());
        context.write(one, text);
    }
}

这是我的主要内容:

public class Skyline {

    public static void main(String[] args) throws IOException{
        //String s = args[0].length() > 0 ? args[0] : "skyline.in";
        Path input, output;
        Configuration conf = new Configuration();

        conf.set("io.serializations", "org.apache.hadoop.io.serializer.JavaSerialization,"
                + "org.apache.hadoop.io.serializer.WritableSerialization");
        try {
            input = new Path(args[0]);
        } catch(ArrayIndexOutOfBoundsException e) {
            input = new Path("hdfs://localhost/user/cloudera/in/skyline.in");
        }
        try {
            output = new Path(args[1]);
            //FileSystem.getLocal(conf).delete(output, true);
        } catch(ArrayIndexOutOfBoundsException e) {
            output = new Path("hdfs://localhost/user/cloudera/out/");
            //FileSystem.getLocal(conf).delete(output, true);
        }

        Job job = new Job(conf, "skyline");

        job.setJarByClass(Skyline.class);

        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(Weather.class);

        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, input);
        FileOutputFormat.setOutputPath(job, output);
        try {
            job.waitForCompletion(true);
        } catch(InterruptedException e) {
            System.out.println("Interrupted Exception");
        } catch(ClassNotFoundException e) {
            System.out.println("ClassNotFoundException");
        }
    }
}

以下是我的Weather类的示例:

public class Weather {

private in stationId;

public Weather(){}

public int getStation(){return this.stationID;}
public void setStation(int r){this.stationID = r}
//...24 additional things of ints, doubles and strings
}

我的智慧结束了。在这一点上,我有一个程序的shell,什么都不做,仍然收到错误。我已经阅读了Java Generics,以确保我正确使用它们(我认为我是),我对MapReduce范例非常环保,但这个程序只是一个shell,从MapReduce教程修改({ {3}})。

1 个答案:

答案 0 :(得分:9)

问题在于,您用于map()输出/ reduce()输入的类Weather未实现Writable。这样可以防止默认SerializationFactory处理您的值。

潜在的概念问题是Hadoop不知道如何将数据类型序列化为光盘并将其读回。这是一个必须的步骤,因为数据必须在从map任务移动到reducer之前保持不变(通常两者可以在不同的节点上运行)。

所以你要做的是实现Writable并在自定义数据类型中添加序列化例程。