我有一个简单的map / reduce作业,可扫描一个hbase表,并修改另一个hbase表。 hadoop作业似乎成功完成,但是当我检查hbase表时,该条目不会出现在那里。
这是hadoop计划:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class HBaseInsertTest extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
String table = "duplicates";
Scan scan = new Scan();
scan.setCaching(500);
scan.setCacheBlocks(false);
Job job = new Job(getConf(), "HBaseInsertTest");
job.setJarByClass(HBaseInsertTest.class);
TableMapReduceUtil.initTableMapperJob(table, scan, Mapper.class, /* mapper output key = */null,
/* mapper output value= */null, job);
TableMapReduceUtil.initTableReducerJob("tablecopy", /*output table=*/null, /*reducer class=*/job);
job.setNumReduceTasks(0);
// Note that these are the default.
job.setOutputFormatClass(NullOutputFormat.class);
return job.waitForCompletion(true) ? 0 : 1;
}
private static class Mapper extends TableMapper<ImmutableBytesWritable, Put> {
@Override
protected void setup(Context context) throws IOException, InterruptedException {
super.setup(context);
}
@Override
public void map(ImmutableBytesWritable row, Result columns, Context context) throws IOException {
long id = 1260018L;
try {
Put put = new Put(Bytes.toBytes(id));
put.add(Bytes.toBytes("mapping"), Bytes.toBytes("foo"), Bytes.toBytes("bar"));
context.write(row, put);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws Exception {
Configuration config = HBaseConfiguration.create();
int res = ToolRunner.run(config, new HBaseInsertTest(), args);
System.exit(res);
}
}
来自HBase shell:
hbase(main):008:0> get 'tablecopy', '1260018', 'mapping'
COLUMN CELL
0 row(s) in 0.0100 seconds
我已经简化了程序以试图演示/隔离问题。我对hadoop / hbase也都比较新。我确实验证了映射是tablecopy表中存在的列族。
答案 0 :(得分:2)
我认为问题是你在查询 HBase的(主):008:0&GT;得到'tablecopy','1260018','mapping'
相反,你应该查询: HBase的(主):008:0&GT;得到'tablecopy',1260018,'mapping'
由于引用,HBase认为这是你要查询的字符串键。此外,如果您只是在您的端部运行一个简单的客户端作业以从HBase检索此密钥,那么如果它已经存在,它将正确地获得您的值。
答案 1 :(得分:-1)
你的问题在于缺少减速器。您需要创建一个扩展TableReducer
的类,该类将Put
作为输入,并使用context.write(ImmutableBytesWritable key, Put put)
将Put写入目标表。
我想象它看起来像这样:
public static class MyReducer extends TableReducer<ImmutableBytesWritable, Put, ImmutableBytesWritable> {
public void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context)
throws IOException, InterruptedException {
for (Put record : values) {
context.write(key, record);
}
}
}
然后,您将表减速器初始化器修改为:
TableMapReduceUtil.initTableReducerJob("tablecopy", MyReducer.class, job);
应该这样做。另一个选择是继续没有reducer并在mapper中打开一个HTable
对象并直接写下它,如下所示:
HTable table = new HTable(Context.getConfiguration(), "output_table_name");
Put myPut = ...;
table.put(myPut);
table.close();
希望这有帮助!