Question

我有一个hadoop（2.2.0）map-reduce作业，它从指定的路径（比如INPUT_PATH）读取文本，并进行一些处理。我不想对输入路径进行硬编码（因为它来自于每周更改的其他来源）。

我相信hadoop应该有一种方法可以在通过命令行运行时指定xml属性文件。我该怎么办？

我想到的一种方法是设置一个指向属性文件位置的环境变量，然后在代码中读取这个env变量，然后读取属性文件。这可能有效，因为env变量的值可以每周更改而不更改代码。但我觉得这是加载属性和覆盖的一种丑陋方式。

请让我知道这样做最不实用的方式。

Answer 1

没有内置的方法来读取输入/输出的任何配置文件。

我建议的一种方法是实现执行以下操作的Java M / R驱动程序，

阅读配置（XML / properties / anything）（可能由其他过程生成/更新）
设置作业属性
使用hadoop命令提交作业（将配置文件作为参数传递）

像这样，

public class SampleMRDriver 
        extends Configured implements Tool {

        @Override
        public int run(
            String[] args)
            throws Exception {

            // Read from args the configuration file
            Properties prop = new Properties();
            prop.loadFromXML(new FileInputStream(args[0]));

            Job job = Job.getInstance(getConf(), "Test Job");

            job.setJarByClass(SampleMRDriver.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);

            job.setMapperClass(TestMapper.class);
            job.setReducerClass(TestReducer.class);

            FileInputFormat.setInputPaths(job, new Path(prop.get("input_path")));
            FileOutputFormat.setOutputPath(job, new Path(prop.get("output_path")));

            boolean success = job.waitForCompletion(true);
            return success ? 0 : 1;

        }

        public static void main(
            String[] args)
            throws Exception {

            ToolRunner.run(new BatteryAnomalyDetection(), args);
        }
}

在hadoop作业中指定作业属性并覆盖属性

1 个答案: