使用MapReduce读取ORC文件

时间:2016-09-18 17:34:07

标签: hadoop hive orc

我正在尝试通过MapReduce读取使用SNAPPY压缩的ORC文件。我的目的只是利用IdentityMapper,主要是合并小文件。但是,我这样做了NullPointerException。我可以从日志中看到正在推断模式,我还需要从mapper设置输出文件的模式吗?

public class Test {

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
    Configuration conf = new Configuration();


    Job job = new Job(conf, "test");

    job.setJarByClass(Test.class);
    job.setMapperClass(Mapper.class);
     conf.set("orc.compress", "SNAPPY");
    job.setOutputKeyClass(NullWritable.class);
    job.setOutputValueClass(Writable.class);
    job.setInputFormatClass(OrcInputFormat.class);
    job.setOutputFormatClass(OrcOutputFormat.class);
    job.setNumReduceTasks(0);


    String source = args[0];
    String target = args[1];

    FileInputFormat.setInputPath(job, new Path(source))
    FileOutputFormat.setOutputPath(job, new Path(target));

    boolean result = job.waitForCompletion(true);

    System.exit(result ? 0 : 1);
}
  

错误:java.lang.NullPointerException at   org.apache.orc.impl.WriterImpl。(WriterImpl.java:178)at at   org.apache.orc.OrcFile.createWriter(OrcFile.java:559)at at   org.apache.orc.mapreduce.OrcOutputFormat.getRecordWriter(OrcOutputFormat.java:55)   在   org.apache.hadoop.mapred.MapTask $ NewDirectOutputCollector。(MapTask.java:644)   在org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)at   org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)at at   org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:168)at at   java.security.AccessController.doPrivileged(Native Method)at   javax.security.auth.Subject.doAs(Subject.java:415)at   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)   在org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

0 个答案:

没有答案