我有一个如下所示的路径过滤器:
public class AvroFileInclusionFilter extends Configured implements PathFilter {
Configuration conf;
@Override
public void setConf(Configuration conf) {
this.conf = conf;
}
@Override
public boolean accept(Path path) {
System.out.println("FileInclusion: " + conf.get("fileInclusion"));
return true;
}
}
我在配置上明确设置fileInclusion属性。出于某种原因,路径过滤器中使用的配置与我在工作中设置的配置不同,如下所示:
Job job = Job.getInstance(getConf(), "Stock Updater");
job.getConfiguration().set("outputPath", opts.outputPath);
String[] inputPaths = findPathsForDays(job.getConfiguration(),
new Path(opts.inputPath), findDaysToQuery(job.getConfiguration(),
opts.updatefile)).toArray(new String[]{});
job.getConfiguration().set("fileInclusion", "hello`");
AvroKeyValueInputFormat.addInputPath(job, new Path(opts.inputPath));
job.getConfiguration().set("mapred.input.pathFilter.class", AvroFileInclusionFilter.class.getName());
job.setInputFormatClass(AvroKeyValueInputFormat.class);
LazyOutputFormat.setOutputFormatClass(job, AvroKeyValueOutputFormat.class);
AvroKeyValueOutputFormat.setOutputPath(job, new Path(opts.outputPath));
job.addCacheFile(new Path(opts.updatefile).toUri());
AvroKeyValueOutputFormat.setCompressOutput(job, true);
job.getConfiguration().set(AvroJob.CONF_OUTPUT_CODEC, snappyCodec().toString());
AvroJob.setInputKeySchema(job, DateKey.SCHEMA$);
AvroJob.setInputValueSchema(job, StockUpdated.SCHEMA$);
AvroJob.setMapOutputKeySchema(job, DateKey.SCHEMA$);
AvroJob.setMapOutputValueSchema(job, StockUpdated.SCHEMA$);
AvroJob.setOutputKeySchema(job, DateKey.SCHEMA$);
AvroJob.setOutputValueSchema(job, StockUpdated.SCHEMA$);
job.setMapperClass(StockUpdaterMapper.class);
job.setReducerClass(StockUpdaterReducer.class);
AvroMultipleOutputs.addNamedOutput(job, "output", AvroKeyValueOutputFormat.class,
DateKey.SCHEMA$, StockUpdated.SCHEMA$);
job.setJarByClass(getClass());
boolean success = job.waitForCompletion(true);
conf.get(" fileInclusion")始终为null,我似乎无法弄清楚原因。我已经在这方面工作了很长一段时间,而且我的绳子已经结束了。为什么配置不同?我使用" hadoop jar"提交作业。和"纱瓶"。
答案 0 :(得分:0)
不要通过提供getConf()方法作为参数来创建对象作业,而是尝试以下
Configuration conf = new Configuration();
conf.set("outputPath", opts.outputPath);
conf.set("mapred.input.pathFilter.class", AvroFileInclusionFilter.class.getName());
..
..
// After setting up the required key values in Configuration object Create Job object by supplying conf
Job job = new Job(conf, "Stock Updater");
答案 1 :(得分:0)
PathFilter应该实现可配置的'而不是'扩展已配置'