以下是我收到的错误:
14/02/28 02:52:43 INFO mapred.JobClient: Task Id : attempt_201402271927_0020_m_000001_2, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:843)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:85)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:584)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
我已将我的代码评论为基本上采用典型的LongWritable和Text然后我只输出一个常量IntWritable 1和一个空的天气类(自定义类):
以下是我的mapper类:
public class Map extends Mapper<LongWritable, Text, IntWritable, Weather> {
private IntWritable id = new IntWritable(1);
private Weather we = new Weather();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//String s;
//String line = value.toString();
//int start[] = {0,18,31,42,53,64,74,84,88,103};
//int end[] = {6,22,33,44,55,66,76,86,93,108};
//if(line.length() > 108) {
// create the object to hold our data
// getStuff()
// parse the string
// push the object onto our data structure
context.write(id, we);
//}
}
这是我的减速机:
public class Reduce extends Reducer<IntWritable, Weather, IntWritable, Text> {
private Text text = new Text("one");
private IntWritable one = new IntWritable(1);
public void reduce(IntWritable key, Iterable<Weather> weather, Context context)
throws IOException, InterruptedException {
//for(Weather w : weather) {
// text.set(w.toString());
context.write(one, text);
}
}
这是我的主要内容:
public class Skyline {
public static void main(String[] args) throws IOException{
//String s = args[0].length() > 0 ? args[0] : "skyline.in";
Path input, output;
Configuration conf = new Configuration();
conf.set("io.serializations", "org.apache.hadoop.io.serializer.JavaSerialization,"
+ "org.apache.hadoop.io.serializer.WritableSerialization");
try {
input = new Path(args[0]);
} catch(ArrayIndexOutOfBoundsException e) {
input = new Path("hdfs://localhost/user/cloudera/in/skyline.in");
}
try {
output = new Path(args[1]);
//FileSystem.getLocal(conf).delete(output, true);
} catch(ArrayIndexOutOfBoundsException e) {
output = new Path("hdfs://localhost/user/cloudera/out/");
//FileSystem.getLocal(conf).delete(output, true);
}
Job job = new Job(conf, "skyline");
job.setJarByClass(Skyline.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Weather.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, input);
FileOutputFormat.setOutputPath(job, output);
try {
job.waitForCompletion(true);
} catch(InterruptedException e) {
System.out.println("Interrupted Exception");
} catch(ClassNotFoundException e) {
System.out.println("ClassNotFoundException");
}
}
}
以下是我的Weather类的示例:
public class Weather {
private in stationId;
public Weather(){}
public int getStation(){return this.stationID;}
public void setStation(int r){this.stationID = r}
//...24 additional things of ints, doubles and strings
}
我的智慧结束了。在这一点上,我有一个程序的shell,什么都不做,仍然收到错误。我已经阅读了Java Generics,以确保我正确使用它们(我认为我是),我对MapReduce范例非常环保,但这个程序只是一个shell,从MapReduce教程修改({ {3}})。
答案 0 :(得分:9)
问题在于,您用于map()
输出/ reduce()
输入的类Weather
未实现Writable
。这样可以防止默认SerializationFactory
处理您的值。
潜在的概念问题是Hadoop不知道如何将数据类型序列化为光盘并将其读回。这是一个必须的步骤,因为数据必须在从map任务移动到reducer之前保持不变(通常两者可以在不同的节点上运行)。
所以你要做的是实现Writable
并在自定义数据类型中添加序列化例程。