我正在尝试直接从我的Mapper输出到HBase表,同时在EMR上使用Hadoop2.4.0和HBase0.94.18。
执行下面的代码时,我遇到了一个讨厌的IOException: Pass a Delete or a Put
。
public class TestHBase {
static class ImportMapper
extends Mapper<MyKey, MyValue, ImmutableBytesWritable, Writable> {
private byte[] family = Bytes.toBytes("f");
@Override
public void map(MyKey key, MyValue value, Context context) {
MyItem item = //do some stuff with key/value and create item
byte[] rowKey = Bytes.toBytes(item.getKey());
Put put = new Put(rowKey);
for (String attr : Arrays.asList("a1", "a2", "a3")) {
byte[] qualifier = Bytes.toBytes(attr);
put.add(family, qualifier, Bytes.toBytes(item.get(attr)));
}
context.write(new ImmutableBytesWritable(rowKey), put);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
String input = args[0];
String table = "table";
Job job = Job.getInstance(conf, "stuff");
job.setJarByClass(ImportMapper.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
FileInputFormat.setInputDirRecursive(job, true);
FileInputFormat.addInputPath(job, new Path(input));
TableMapReduceUtil.initTableReducerJob(
table, // output table
null, // reducer class
job);
job.setNumReduceTasks(0);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
有谁知道我做错了什么?
堆栈跟踪
错误:java.io.IOException:在org.apache.hadoop.hbase.mapreduce的org.apache.hadoop.hbase.mapreduce.TableOutputFormat $ TableRecordWriter.write(TableOutputFormat.java:125)中传递Delete或Put .tableOutputFormat $ TableRecordWriter.write(TableOutputFormat.java:84)org.apache.hadoop.mapred.MapTask $ NewDirectOutputCollector.write(MapTask.java:646)at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl的.java:89)在org.apache.hadoop.mapreduce.lib.map.WrappedMapper $ Context.write(WrappedMapper.java:112)在org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)在org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:775)org.apache.hadoop.mapred.MapTask.run( MapTask.java:341)atg.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:167)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject .java:415)在org.apache.hadoop.security org.apache.hadoop.mapred.YarnChild.main中的.UserGroupInformation.doAs(UserGroupInformation.java:1548)(YarnChild.java:162)由ApplicationMaster杀死的容器。根据要求杀死容器。退出代码为143 Container退出,退出代码为非零143
答案 0 :(得分:2)
如果你能显示完整的堆栈跟踪会更好,这样我就可以帮你轻松解决它。我没有执行你的代码。至于我看过你的代码,这可能是问题
job.setNumReduceTasks(0);
Mapper期望您的put
对象直接写入Apache HBase。
您可以增加setNumReduceTasks
或者如果您看到API,则可以找到其默认值并对其进行评论。
答案 1 :(得分:0)
感谢您添加堆栈跟踪。不幸的是,你没有包含抛出异常的代码,因此我无法完全跟踪它。相反,我做了一些搜索,为你发现了一些东西。
您的堆栈跟踪类似于另一个SO问题: Pass a Delete or a Put error in hbase mapreduce
那个人通过评论job.setNumReduceTasks(0);
有一个类似的SO问题有相同的例外,但无法解决问题。相反,它有注释问题:
"java.io.IOException: Pass a Delete or a Put" when reading HDFS and storing HBase
Here是如何使用setNumReduceTasks在0和1或更高时编写工作代码的一些很好的例子。
“51.2.HBase MapReduce读/写示例 以下是使用HBase作为源和MapReduce的接收器的示例。此示例将简单地将数据从一个表复制到另一个表。
Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleReadWrite");
job.setJarByClass(MyReadWriteJob.class); // class that contains mapper
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
// set other scan attrs
TableMapReduceUtil.initTableMapperJob(
sourceTable, // input table
scan, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper class
null, // mapper output key
null, // mapper output value
job);
TableMapReduceUtil.initTableReducerJob(
targetTable, // output table
null, // reducer class
job);
job.setNumReduceTasks(0);
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
这是一个或多个例子:
“51.4.HBase MapReduce摘要到HBase示例 以下示例使用HBase作为MapReduce源并使用汇总步骤接收。此示例将计算表中值的不同实例的数量,并将这些汇总计数写入另一个表中。
Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleSummary");
job.setJarByClass(MySummaryJob.class); // class that contains mapper and reducer
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
// set other scan attrs
TableMapReduceUtil.initTableMapperJob(
sourceTable, // input table
scan, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper class
Text.class, // mapper output key
IntWritable.class, // mapper output value
job);
TableMapReduceUtil.initTableReducerJob(
targetTable, // output table
MyTableReducer.class, // reducer class
job);
job.setNumReduceTasks(1); // at least one, adjust as required
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
http://hbase.apache.org/book.html#mapreduce.example
您似乎更紧密地关注第一个示例。我想表明有时候有理由将减少任务的数量设置为零。