在hadoop mapreduce中读取镶木地板文件

时间:2017-11-29 12:19:51

标签: java hadoop parquet

我是hadoop的新手,我需要在地图缩小过程的地图阶段阅读镶木地板文件。我在cloudera找到了以下代码片段:

public static class MyMap extends
  Mapper<LongWritable, Group, NullWritable, Text> {

  @Override
  public void map(LongWritable key, Group value, Context context) throws IOException, InterruptedException {
      NullWritable outKey = NullWritable.get();
      String outputRecord = "";
      // Get the schema and field values of the record
      String inputRecord = value.toString();
      // Process the value, create an output record
      // ...
      context.write(outKey, new Text(outputRecord));
  }
}

工作配置:

public int run(String[] args) throws Exception {

Job job = new Job(getConf());

job.setJarByClass(getClass());
job.setJobName(getClass().getName());
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MyMap.class);
job.setNumReduceTasks(0);

job.setInputFormatClass(ExampleInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.waitForCompletion(true);
return 0;
}

问题是我可以使用自己的类型而不是键和值以及如何实现它?我的意思是一种pojo,代表来自镶木地板文件的一条记录。

0 个答案:

没有答案