我有火花流媒体工作,在这有些我正在进行一些聚合,现在我想将这些记录插入HBase但是它不是典型的插入我想做UPSERT如果rowkey可用而不是列值sum(newvalue + oldvalue )应该发生。 有没有人在java中共享伪代码我该如何实现呢?
答案 0 :(得分:1)
像这样......
byte[] rowKey = null; // Provided
Table table = null; // Provided
long newValue = 1000; // Provided
byte[] FAMILY = new byte[]{0}; // Defined
byte[] QUALIFIER = new byte[]{1}; // Defined
try {
Get get = new Get(rowKey);
Result result = table.get(get);
if (!result.isEmpty()) {
Cell cell = result.getColumnLatestCell(FAMILY, QUALIFIER);
newValue += Bytes.bytesToLong(cell.getValueArray(),cell.getValueOffset());
}
Put put = new Put(rowKey);
put.addColumn(FAMILY,QUALIFIER,Bytes.toBytes(newValue));
table.put(put);
} catch (Exception e) {
// Handle Exceptions...
}
我们(Splice Machine [开源])有一些很酷的教程,使用Spark Streaming在HBase中存储数据。
检查it。可能很有趣。
答案 1 :(得分:0)
我发现下面的方法是伪代码: -
===========对于UPSERT(更新和插入)===========
public void HbaseUpsert(JavaRDD< Row> javaRDD)抛出IOException,ServiceException {
JavaPairRDD < ImmutableBytesWritable, Put > hbasePuts1 = javaRDD.mapToPair(
new PairFunction < Row, ImmutableBytesWritable, Put > () {
private static final long serialVersionUID = 1L;
public Tuple2 < ImmutableBytesWritable, Put > call(Row row) throws Exception {
if(HbaseConfigurationReader.getInstance()!=null)
{
HTable table = new HTable(HbaseConfigurationReader.getInstance().initializeHbaseConfiguration(), "TEST");
try {
String Column1 = row.getString(1);
long Column2 = row.getLong(2);
Get get = new Get(Bytes.toBytes(row.getString(0)));
Result result = table.get(get);
if (!result.isEmpty()) {
Cell cell = result.getColumnLatestCell(Bytes.toBytes("cf1"), Bytes.toBytes("Column2"));
Column2 += Bytes.toLong(cell.getValueArray(),cell.getValueOffset());
}
Put put = new Put(Bytes.toBytes(row.getString(0)));
put.add(Bytes.toBytes("cf1"), Bytes.toBytes("Column1"), Bytes.toBytes(Column1));
put.add(Bytes.toBytes("cf1"), Bytes.toBytes("Column2"), Bytes.toBytes(Column2));
return new Tuple2 < ImmutableBytesWritable, Put > (new ImmutableBytesWritable(), put);
} catch (Exception e) {
e.printStackTrace();
}
finally {
table.close();
}
}
return null;
}
});
hbasePuts1.saveAsNewAPIHadoopDataset(HbaseConfigurationReader.initializeHbaseConfiguration());
}
==============对于配置=============== 公共类HbaseConfigurationReader实现Serializable {
static Job newAPIJobConfiguration1 =null;
private static Configuration conf =null;
private static HTable table= null;
private static HbaseConfigurationReader instance= null;
private static Log logger= LogFactory.getLog(HbaseConfigurationReader.class);
HbaseConfigurationReader()抛出MasterNotRunningException,ZooKeeperConnectionException,ServiceException,IOException { initializeHbaseConfiguration(); }
public static HbaseConfigurationReader getInstance()抛出MasterNotRunningException,ZooKeeperConnectionException,ServiceException,IOException {
if (instance == null) {
instance = new HbaseConfigurationReader();
}
return instance;
} public static Configuration initializeHbaseConfiguration()抛出MasterNotRunningException,ZooKeeperConnectionException,ServiceException,IOException { 如果(CONF == NULL) { CONF = HBaseConfiguration.create(); conf.set(&#34; hbase.zookeeper.quorum&#34;,&#34; localhost&#34;); conf.set(&#34; hbase.zookeeper.property.clientPort&#34;,&#34; 2181&#34;); HBaseAdmin.checkHBaseAvailable(CONF); table = new HTable(conf,&#34; TEST&#34;); conf.set(org.apache.hadoop.hbase.mapreduce.TableInputFormat.INPUT_TABLE,&#34; TEST&#34;); 尝试{ newAPIJobConfiguration1 = Job.getInstance(conf); newAPIJobConfiguration1.getConfiguration()。set(TableOutputFormat.OUTPUT_TABLE,&#34; TEST&#34;); newAPIJobConfiguration1.setOutputFormatClass(org.apache.hadoop.hbase.mapreduce.TableOutputFormat.class); } catch(IOException e){ e.printStackTrace(); }
}
else
logger.info("Configuration comes null");
return newAPIJobConfiguration1.getConfiguration();
} }