尝试hbase批量加载作业时,reducer使用布隆过滤器来抱怨无序输入

时间:2014-08-11 17:37:16

标签: java hadoop hbase

我正在使用我设置的map-reduce作业进行大规模的hbase导入。

job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);
job.setMapperClass(BulkMapper.class);

job.setOutputFormatClass(HFileOutputFormat.class);

FileInputFormat.setInputPaths(job, new Path(inputPath));
FileOutputFormat.setOutputPath(job, new Path(outputPath));
HFileOutputFormat.configureIncrementalLoad(job, hTable);  //This creates a text file that will be full of put statements, should take 10 minutes or so
boolean suc = job.waitForCompletion(true);

它使用我自己创建的映射器,HFileOutputFormat.configureIncrementalLoad设置一个reducer。我之前已经使用此设置完成了概念验证,但是当我在大型数据集上运行它时,它在reducer中死了,并出现此错误:

Error: java.io.IOException: Non-increasing Bloom keys: BLMX2014-02-03nullAdded after BLMX2014-02-03nullRemoved at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.appendGeneralBloomfilter(StoreFile.java:934) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:970) at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.write(HFileOutputFormat.java:168) at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.write(HFileOutputFormat.java:124) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:576) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce(PutSortReducer.java:78) at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce(PutSortReducer.java:43) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:645) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:405) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 

我认为hadoop应该保证对减速器的分类输入,如果是这样,我为什么会遇到这个问题,有什么办法可以避免它吗?

1 个答案:

答案 0 :(得分:1)

我很生气,这很有效,问题就在于我输入地图输出的方式。我用以下方法替换了以前用于输出的内容:

ImmutableBytesWritable HKey = new ImmutableBytesWritable(put.getRow());
context.write(HKey, put);

基本上我使用的密钥和put语句的关键字略有不同,这导致reducer无法接收put语句。