Mapreduce作业到HBase抛出IOException:传递一个删除或一个Put

时间:2015-02-16 00:07:38

标签: java hadoop mapreduce hbase elastic-map-reduce

我正在尝试直接从我的Mapper输出到HBase表,同时在EMR上使用Hadoop2.4.0和HBase0.94.18。

执行下面的代码时,我遇到了一个讨厌的IOException: Pass a Delete or a Put

public class TestHBase {
  static class ImportMapper 
            extends Mapper<MyKey, MyValue, ImmutableBytesWritable, Writable> {
    private byte[] family = Bytes.toBytes("f");

    @Override
    public void map(MyKey key, MyValue value, Context context) {
      MyItem item = //do some stuff with key/value and create item
      byte[] rowKey = Bytes.toBytes(item.getKey());
      Put put = new Put(rowKey);
      for (String attr : Arrays.asList("a1", "a2", "a3")) {
        byte[] qualifier = Bytes.toBytes(attr);
        put.add(family, qualifier, Bytes.toBytes(item.get(attr)));
      }
      context.write(new ImmutableBytesWritable(rowKey), put);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = HBaseConfiguration.create();
    String input = args[0];
    String table = "table";
    Job job = Job.getInstance(conf, "stuff");

    job.setJarByClass(ImportMapper.class);
    job.setInputFormatClass(SequenceFileInputFormat.class);
    FileInputFormat.setInputDirRecursive(job, true);
    FileInputFormat.addInputPath(job, new Path(input));

    TableMapReduceUtil.initTableReducerJob(
            table,                  // output table
            null,                   // reducer class
            job);
    job.setNumReduceTasks(0);
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

有谁知道我做错了什么?

堆栈跟踪

错误:java.io.IOException:在org.apache.hadoop.hbase.mapreduce的org.apache.hadoop.hbase.mapreduce.TableOutputFormat $ TableRecordWriter.write(TableOutputFormat.java:125)中传递Delete或Put .tableOutputFormat $ TableRecordWriter.write(TableOutputFormat.java:84)org.apache.hadoop.mapred.MapTask $ NewDirectOutputCollector.write(MapTask.java:646)at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl的.java:89)在org.apache.hadoop.mapreduce.lib.map.WrappedMapper $ Context.write(WrappedMapper.java:112)在org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)在org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:775)org.apache.hadoop.mapred.MapTask.run( MapTask.java:341)atg.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:167)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject .java:415)在org.apache.hadoop.security org.apache.hadoop.mapred.YarnChild.main中的.UserGroupInformation.doAs(UserGroupInformation.java:1548)(YarnChild.java:162)由ApplicationMaster杀死的容器。根据要求杀死容器。退出代码为143 Container退出,退出代码为非零143

2 个答案:

答案 0 :(得分:2)

如果你能显示完整的堆栈跟踪会更好,这样我就可以帮你轻松解决它。我没有执行你的代码。至于我看过你的代码,这可能是问题 job.setNumReduceTasks(0);

Mapper期望您的put对象直接写入Apache HBase。 您可以增加setNumReduceTasks或者如果您看到API,则可以找到其默认值并对其进行评论。

答案 1 :(得分:0)

感谢您添加堆栈跟踪。不幸的是,你没有包含抛出异常的代码,因此我无法完全跟踪它。相反,我做了一些搜索,为你发现了一些东西。

您的堆栈跟踪类似于另一个SO问题: Pass a Delete or a Put error in hbase mapreduce

那个人通过评论job.setNumReduceTasks(0);

解决了这个问题

有一个类似的SO问题有相同的例外,但无法解决问题。相反,它有注释问题:

"java.io.IOException: Pass a Delete or a Put" when reading HDFS and storing HBase


Here是如何使用setNumReduceTasks在0和1或更高时编写工作代码的一些很好的例子。

“51.2.HBase MapReduce读/写示例 以下是使用HBase作为源和MapReduce的接收器的示例。此示例将简单地将数据从一个表复制到另一个表。

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleReadWrite");
job.setJarByClass(MyReadWriteJob.class);    // class that contains mapper

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
  sourceTable,      // input table
  scan,             // Scan instance to control CF and attribute selection
  MyMapper.class,   // mapper class
  null,             // mapper output key
  null,             // mapper output value
  job);
TableMapReduceUtil.initTableReducerJob(
  targetTable,      // output table
  null,             // reducer class
  job);
job.setNumReduceTasks(0);

boolean b = job.waitForCompletion(true);
if (!b) {
    throw new IOException("error with job!");
}

这是一个或多个例子:

“51.4.HBase MapReduce摘要到HBase示例 以下示例使用HBase作为MapReduce源并使用汇总步骤接收。此示例将计算表中值的不同实例的数量,并将这些汇总计数写入另一个表中。

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleSummary");
job.setJarByClass(MySummaryJob.class);     // class that contains mapper and reducer

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
  sourceTable,        // input table
  scan,               // Scan instance to control CF and attribute selection
  MyMapper.class,     // mapper class
  Text.class,         // mapper output key
  IntWritable.class,  // mapper output value
  job);
TableMapReduceUtil.initTableReducerJob(
  targetTable,        // output table
  MyTableReducer.class,    // reducer class
  job);
job.setNumReduceTasks(1);   // at least one, adjust as required

boolean b = job.waitForCompletion(true);
if (!b) {
  throw new IOException("error with job!");
}

http://hbase.apache.org/book.html#mapreduce.example

您似乎更紧密地关注第一个示例。我想表明有时候有理由将减少任务的数量设置为零。