Hadoop PathFilter配置为null

时间:2014-04-08 05:40:49

标签: hadoop

我有一个如下所示的路径过滤器:

public class AvroFileInclusionFilter extends Configured implements PathFilter {
  Configuration conf;

  @Override
  public void setConf(Configuration conf) {
      this.conf = conf;
  }

  @Override
  public boolean accept(Path path) {

      System.out.println("FileInclusion: " + conf.get("fileInclusion"));

      return true;
  }
}

我在配置上明确设置fileInclusion属性。出于某种原因,路径过滤器中使用的配置与我在工作中设置的配置不同,如下所示:

    Job job = Job.getInstance(getConf(), "Stock Updater");

    job.getConfiguration().set("outputPath", opts.outputPath);

    String[] inputPaths = findPathsForDays(job.getConfiguration(),
            new Path(opts.inputPath), findDaysToQuery(job.getConfiguration(),
                    opts.updatefile)).toArray(new String[]{});
    job.getConfiguration().set("fileInclusion", "hello`");

    AvroKeyValueInputFormat.addInputPath(job, new Path(opts.inputPath));
    job.getConfiguration().set("mapred.input.pathFilter.class", AvroFileInclusionFilter.class.getName());

    job.setInputFormatClass(AvroKeyValueInputFormat.class);

    LazyOutputFormat.setOutputFormatClass(job, AvroKeyValueOutputFormat.class);
    AvroKeyValueOutputFormat.setOutputPath(job, new Path(opts.outputPath));

    job.addCacheFile(new Path(opts.updatefile).toUri());

    AvroKeyValueOutputFormat.setCompressOutput(job, true);
    job.getConfiguration().set(AvroJob.CONF_OUTPUT_CODEC, snappyCodec().toString());

    AvroJob.setInputKeySchema(job, DateKey.SCHEMA$);
    AvroJob.setInputValueSchema(job, StockUpdated.SCHEMA$);
    AvroJob.setMapOutputKeySchema(job, DateKey.SCHEMA$);
    AvroJob.setMapOutputValueSchema(job, StockUpdated.SCHEMA$);
    AvroJob.setOutputKeySchema(job, DateKey.SCHEMA$);
    AvroJob.setOutputValueSchema(job, StockUpdated.SCHEMA$);

    job.setMapperClass(StockUpdaterMapper.class);
    job.setReducerClass(StockUpdaterReducer.class);

    AvroMultipleOutputs.addNamedOutput(job, "output", AvroKeyValueOutputFormat.class,
            DateKey.SCHEMA$, StockUpdated.SCHEMA$);

    job.setJarByClass(getClass());

    boolean success = job.waitForCompletion(true);

conf.get(" fileInclusion")始终为null,我似乎无法弄清楚原因。我已经在这方面工作了很长一段时间,而且我的绳子已经结束了。为什么配置不同?我使用" hadoop jar"提交作业。和"纱瓶"。

2 个答案:

答案 0 :(得分:0)

不要通过提供getConf()方法作为参数来创建对象作业,而是尝试以下

Configuration conf = new Configuration();
conf.set("outputPath", opts.outputPath);
conf.set("mapred.input.pathFilter.class", AvroFileInclusionFilter.class.getName());
..
..
// After setting up the required key values in Configuration object Create Job object by supplying conf
Job job = new Job(conf, "Stock Updater"); 

答案 1 :(得分:0)

PathFilter应该实现可配置的'而不是'扩展已配置'