在运行时确定Hadoop MapReduce的输出数据类型是不可能的?

时间:2015-05-07 11:18:58

标签: hadoop serialization mapreduce type-erasure generics

Hadoop Framework需要知道Mapper和Reducer中的输出数据类型,以便在运行时创建这些类型的实例,以反映Mapper和Reducer之间的值,以及在将实例从Reducer序列化到输出文件期间。因此,我们必须告诉Hadoop框架有关作业对象中的输出数据,如

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);

It is not possible to infer the types at runtime from the class definitions of the Mapper and Reducer since Java Generics uses Type Erasure.

假设这是一个映射器类

public static class SelectClauseMapper
    extends Mapper<LongWritable, Text, NullWritable, Text> {

    public void map(LongWritable key, Text value, Context context)
        throws IOException, InterruptedException {
        if(!AirlineDataUtils.isHeader(value)){
            StringBuilder output = AirlineDataUtils.mergeStringArray(
                AirlineDataUtils.getSelectResultsPerRow(value),
                ",");
                context.write(NullWritable.get(),new Text(output.toString()));
        }
    }

有人可以在上面的例子中解释为什么不能在运行时确定输出类型吗?

0 个答案:

没有答案