Question

我需要在HBase表中插入4亿行。

架构看起来像这样

我通过简单地将int和int以及值连接为System.nanoTime（）

来生成密钥

我的mapper看起来像这样

public class DatasetMapper extends Tablemapper <Text,LongWritable> {


  private static Configuration conf = HBaseConfiguration.create();


public void map (Text key, LongWritable values, Context context) throws exception {

   // instantiate HTable object that connects to table name 
   HTable htable = new HTable(conf,"temp") // already created temp table 
   htable.setAutoFlush(flase);
   htable.setWriteBufferSize(1024*1024*12);

   // construct key
   int i = 0, j = 0;
   for(i=0; i<400000000,i++) {
       String rowkey = Integer.toString(i).concat(Integer.toString(j));
       Long value = Math.abs(System.nanoTime());
       Put put = new Put(Bytes.toBytes(rowkey));
           put.add(Bytes.toBytes("location"),Bytes.toBytes("longlat"),Bytes.toBytes(value);
       htable.put(put)
       j++;
       htable.flushCommits();
}
}

我的工作看起来像这样

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"initdb");
job.setJarByClass(DatasetMapper.class);    // class that contains mapper

TableMapReduceUtil.initTableMapperJob(
null,      // input table
null,            
DatabaseMapper.class,   // mapper class
null,             // mapper output key
null,             // mapper output value
job);
TableMapReduceUtil.initTableReducerJob(
temp,      // output table
null,             // reducer class
job);
job.setNumReduceTasks(0);

boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}

作业运行但插入0条记录。我知道我犯了一些错误但我无法抓住它，因为我是HBase的新手。请帮帮我。

感谢

Answer 1

首先，映射器的名称是 DatasetMapper ，但在作业配置中，您已指定 DatabaseMapper 。我想知道它是如何工作没有任何错误。

接下来，看起来您已将TableMapper和Mapper用法混合在一起。 Hbase TableMapper是一个抽象类，它扩展了Hadoop Mapper并帮助我们方便地从HBase读取，TableReducer有助于写回HBase。您正尝试从Mapper中放入数据，并且您正在同时使用TableReducer。实际上，mapper实际上永远不会被调用。

使用TableReducer放置数据或仅使用Mapper。如果您真的希望在Mapper中使用 TableOutputFormat 类。请参阅HBase权威指南第301页中给出的示例。这是Google Books link

HTH

P.S。：您可能会发现这些链接有助于正确学习HBase + MR集成：

Link 1.

Link 2.

使用MapReduce将数据批量插入HBase

1 个答案: