Hadoop mapreduce - reducer没有运行

时间:2016-03-16 15:43:46

标签: java hadoop mapreduce reducers

我正在尝试自定义批量加载map-reduce到HBase,我遇到了reducer的问题。首先我认为我没有很好地编写reducer,但是在reducer中抛出运行时异常并看到代码正常工作时,我意识到reducer根本没有运行。 到目前为止,我认为这个问题的一些常见答案没有任何问题;

  1. 我的配置有mapoutput和输出单独的
  2. 我的reducer和mapper已覆盖
  3. 我有Iterable,我的reducer输入是(可写的,put),所以......
  4. 这是我的代码:

    驱动程序

    ...
    if @company.save
      custom_action(@company.employees.select do |employee|
        # Select only newly created employees
      end)
    end
    

    映射

    public int run(String[] args) throws Exception {
        int result=0;
        String outputPath = args[1];
        Configuration configuration = getConf();
        configuration.set("data.seperator", DATA_SEPERATOR);
        configuration.set("hbase.table.name",TABLE_NAME);
        configuration.set("COLUMN_FAMILY_1",COLUMN_FAMILY_1);
        Job job = new Job(configuration);
        job.setJarByClass(HBaseBulkLoadDriver.class);
        job.setJobName("Bulk Loading HBase Table::"+TABLE_NAME);
        job.setInputFormatClass(TextInputFormat.class);
        job.setMapOutputKeyClass(ImmutableBytesWritable.class);
        job.setMapperClass(HBaseBulkLoadMapper.class);
        job.setReducerClass(HBaseBulkLoadReducer.class);
        job.setOutputKeyClass(ImmutableBytesWritable.class);
        job.setOutputValueClass(Put.class);
        FileInputFormat.addInputPaths(job, args[0]);
        FileSystem.getLocal(getConf()).delete(new Path(outputPath), true);
        FileOutputFormat.setOutputPath(job, new Path(outputPath));
        job.setMapOutputValueClass(Put.class);
        job.setNumReduceTasks(1);
        HFileOutputFormat.configureIncrementalLoad(job, new HTable(configuration,TABLE_NAME));
        job.waitForCompletion(true);
    

    减速

    public class HBaseBulkLoadMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
        private String hbaseTable;
        private String dataSeperator;
        private String columnFamily1;
        private ImmutableBytesWritable hbaseTableName;
    
        public void setup(Context context) {
            Configuration configuration = context.getConfiguration();
            hbaseTable = configuration.get("hbase.table.name");
            dataSeperator = configuration.get("data.seperator");
            columnFamily1 = configuration.get("COLUMN_FAMILY_1");
            hbaseTableName = new ImmutableBytesWritable(Bytes.toBytes(hbaseTable));
        }
            @Override
        public void map(LongWritable key, Text value, Context context) {
            try {
                String[] values = value.toString().split(dataSeperator);
                String rowKey = values[0];
                Put put = new Put(Bytes.toBytes(rowKey));
                BUNCH OF ADDS;
                context.write(new ImmutableBytesWritable(Bytes.toBytes(rowKey)), put);
            } catch(Exception exception) {
                exception.printStackTrace();
            }
        }
    }
    

    我知道reducer有点乱,但事实是,它在if和else子句上都有runtimeException,你可以看到并且批量加载成功,所以我很确定reducer没有运行 - 我不是确定为什么。这三个文件都是maven打包在同一个目录中,仅供参考。

1 个答案:

答案 0 :(得分:0)

弄清楚出了什么问题。 configureincrementalload根据输出值将reducer类设置为putsort或keyvaluesort,因此如果我想使用自定义reducer类,我必须在configureincrementalload之后设置它。之后,我可以看到减速机正在运行。只是回答我自己的问题,这样可以帮助那些遇到同样问题的人。

HFileOutputFormat.configureIncrementalLoad(job, new HTable(configuration,TABLE_NAME));
job.setReducerClass(HBaseBulkLoadReducer.class);
job.waitForCompletion(true);