我想扫描一个hbase表,我的代码如下。
public void start() throws IOException {
SparkConf conf = new SparkConf().setAppName("Simple Application");
JavaSparkContext sc = new JavaSparkContext(conf);
Configuration hbaseConf = HBaseConfiguration.create();
Scan scan = new Scan();
scan.setStartRow(Bytes.toBytes("0001"));
scan.setStopRow(Bytes.toBytes("0004"));
scan.addFamily(Bytes.toBytes("DATA"));
scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("TIME"));
ClientProtos.Scan proto = ProtobufUtil.toScan(scan);
String scanStr = Base64.encodeBytes(proto.toByteArray());
String tableName = "rdga_by_id";
hbaseConf.set(TableInputFormat.INPUT_TABLE, tableName);
hbaseConf.set(TableInputFormat.SCAN, scanStr);
JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD = sc.newAPIHadoopRDD(hbaseConf,TableInputFormat.class, ImmutableBytesWritable.class, Result.class);
System.out.println("here: " + hBaseRDD.count());
PairFunction<Tuple2<ImmutableBytesWritable, Result>, Integer, Integer> pairFunc =
new PairFunction<Tuple2<ImmutableBytesWritable, Result>, Integer, Integer>() {
@Override
public Tuple2<Integer, Integer> call(Tuple2<ImmutableBytesWritable, Result> immutableBytesWritableResultTuple2) throws Exception {
byte[] time = immutableBytesWritableResultTuple2._2().getValue(Bytes.toBytes("DATA"), Bytes.toBytes("TIME"));
byte[] id = /* I want to get Row Key here */
if (time != null && id != null) {
return new Tuple2<Integer, Integer>(byteArrToInteger(id), byteArrToInteger(time));
}
else {
return null;
}
}
};
现在我想得到每个结果的行键。但我只能在扫描中设置族和列。我怎样才能获得行密钥?是否有任何函数或方法如result.getRowkey()
可以与JavaPairRDD一起使用?或者我应该如何设置Scan
以便在结果中保留行键?
提前致谢!
答案 0 :(得分:1)
结果已包含您的行。实际上你的行键是ImmutableBytesWritable。您只需将其再次转换为String,如:
String rowKey = new String(immutableBytesWritableResultTuple2._1.get());
我不确定您使用的是哪个版本的Spark。在版本为1.2.0的spark-core_2.10中,“newAPIHadoopRDD”方法不返回JavaPairRDD,调用会产生如下代码:
RDD<Tuple2<ImmutableBytesWritable, Result>> hBaseRDD = sc.newAPIHadoopRDD(hbaseConf,TableInputFormat.class, ImmutableBytesWritable.class, Result.class);
然而,“hbaseRDD”然后提供了在必要时将其转换为JavaRDD的函数:
hBaseRDD.toJavaRDD();
然后您可以使用“.mapToPair”方法并使用您定义的函数。