Question

我正在尝试自定义批量加载map-reduce到HBase，我遇到了reducer的问题。首先我认为我没有很好地编写reducer，但是在reducer中抛出运行时异常并看到代码正常工作时，我意识到reducer根本没有运行。到目前为止，我认为这个问题的一些常见答案没有任何问题;

我的配置有mapoutput和输出单独的
我的reducer和mapper已覆盖
我有Iterable，我的reducer输入是（可写的，put），所以......

这是我的代码：

驱动程序

...
if @company.save
  custom_action(@company.employees.select do |employee|
    # Select only newly created employees
  end)
end

映射

public int run(String[] args) throws Exception {
    int result=0;
    String outputPath = args[1];
    Configuration configuration = getConf();
    configuration.set("data.seperator", DATA_SEPERATOR);
    configuration.set("hbase.table.name",TABLE_NAME);
    configuration.set("COLUMN_FAMILY_1",COLUMN_FAMILY_1);
    Job job = new Job(configuration);
    job.setJarByClass(HBaseBulkLoadDriver.class);
    job.setJobName("Bulk Loading HBase Table::"+TABLE_NAME);
    job.setInputFormatClass(TextInputFormat.class);
    job.setMapOutputKeyClass(ImmutableBytesWritable.class);
    job.setMapperClass(HBaseBulkLoadMapper.class);
    job.setReducerClass(HBaseBulkLoadReducer.class);
    job.setOutputKeyClass(ImmutableBytesWritable.class);
    job.setOutputValueClass(Put.class);
    FileInputFormat.addInputPaths(job, args[0]);
    FileSystem.getLocal(getConf()).delete(new Path(outputPath), true);
    FileOutputFormat.setOutputPath(job, new Path(outputPath));
    job.setMapOutputValueClass(Put.class);
    job.setNumReduceTasks(1);
    HFileOutputFormat.configureIncrementalLoad(job, new HTable(configuration,TABLE_NAME));
    job.waitForCompletion(true);

减速

public class HBaseBulkLoadMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
    private String hbaseTable;
    private String dataSeperator;
    private String columnFamily1;
    private ImmutableBytesWritable hbaseTableName;

    public void setup(Context context) {
        Configuration configuration = context.getConfiguration();
        hbaseTable = configuration.get("hbase.table.name");
        dataSeperator = configuration.get("data.seperator");
        columnFamily1 = configuration.get("COLUMN_FAMILY_1");
        hbaseTableName = new ImmutableBytesWritable(Bytes.toBytes(hbaseTable));
    }
        @Override
    public void map(LongWritable key, Text value, Context context) {
        try {
            String[] values = value.toString().split(dataSeperator);
            String rowKey = values[0];
            Put put = new Put(Bytes.toBytes(rowKey));
            BUNCH OF ADDS;
            context.write(new ImmutableBytesWritable(Bytes.toBytes(rowKey)), put);
        } catch(Exception exception) {
            exception.printStackTrace();
        }
    }
}

我知道reducer有点乱，但事实是，它在if和else子句上都有runtimeException，你可以看到并且批量加载成功，所以我很确定reducer没有运行 - 我不是确定为什么。这三个文件都是maven打包在同一个目录中，仅供参考。

Answer 1

弄清楚出了什么问题。 configureincrementalload根据输出值将reducer类设置为putsort或keyvaluesort，因此如果我想使用自定义reducer类，我必须在configureincrementalload之后设置它。之后，我可以看到减速机正在运行。只是回答我自己的问题，这样可以帮助那些遇到同样问题的人。

HFileOutputFormat.configureIncrementalLoad(job, new HTable(configuration,TABLE_NAME));
job.setReducerClass(HBaseBulkLoadReducer.class);
job.waitForCompletion(true);

Hadoop mapreduce - reducer没有运行

1 个答案: