我正在尝试通过MapReduce读取使用SNAPPY压缩的ORC文件。我的目的只是利用IdentityMapper,主要是合并小文件。但是,我这样做了NullPointerException
。我可以从日志中看到正在推断模式,我还需要从mapper设置输出文件的模式吗?
public class Test {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = new Job(conf, "test");
job.setJarByClass(Test.class);
job.setMapperClass(Mapper.class);
conf.set("orc.compress", "SNAPPY");
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Writable.class);
job.setInputFormatClass(OrcInputFormat.class);
job.setOutputFormatClass(OrcOutputFormat.class);
job.setNumReduceTasks(0);
String source = args[0];
String target = args[1];
FileInputFormat.setInputPath(job, new Path(source))
FileOutputFormat.setOutputPath(job, new Path(target));
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
错误:java.lang.NullPointerException at org.apache.orc.impl.WriterImpl。(WriterImpl.java:178)at at org.apache.orc.OrcFile.createWriter(OrcFile.java:559)at at org.apache.orc.mapreduce.OrcOutputFormat.getRecordWriter(OrcOutputFormat.java:55) 在 org.apache.hadoop.mapred.MapTask $ NewDirectOutputCollector。(MapTask.java:644) 在org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)at at org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:168)at at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) 在org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)