Question

我有火花流媒体工作，在这有些我正在进行一些聚合，现在我想将这些记录插入HBase但是它不是典型的插入我想做UPSERT如果rowkey可用而不是列值sum（newvalue + oldvalue ）应该发生。有没有人在java中共享伪代码我该如何实现呢？

Answer 1

像这样......

byte[] rowKey = null; // Provided
Table table = null; // Provided
long newValue = 1000; // Provided
byte[] FAMILY = new byte[]{0}; // Defined
byte[] QUALIFIER = new byte[]{1}; // Defined

try {
    Get get = new Get(rowKey);
    Result result = table.get(get);
    if (!result.isEmpty()) {
        Cell cell = result.getColumnLatestCell(FAMILY, QUALIFIER);
        newValue += Bytes.bytesToLong(cell.getValueArray(),cell.getValueOffset());
    }
    Put put = new Put(rowKey);
    put.addColumn(FAMILY,QUALIFIER,Bytes.toBytes(newValue));
    table.put(put);
} catch (Exception e) {
    // Handle Exceptions...
}

我们（Splice Machine [开源]）有一些很酷的教程，使用Spark Streaming在HBase中存储数据。

检查it。可能很有趣。

Answer 2

我发现下面的方法是伪代码： -

===========对于UPSERT（更新和插入）===========

public void HbaseUpsert（JavaRDD＆lt; Row＆gt; javaRDD）抛出IOException，ServiceException {

     JavaPairRDD < ImmutableBytesWritable, Put > hbasePuts1 = javaRDD.mapToPair(

      new PairFunction < Row, ImmutableBytesWritable, Put > () {

        private static final long serialVersionUID = 1L;
    public Tuple2 < ImmutableBytesWritable, Put > call(Row row) throws Exception {

            if(HbaseConfigurationReader.getInstance()!=null)
            {
            HTable table = new HTable(HbaseConfigurationReader.getInstance().initializeHbaseConfiguration(), "TEST");

        try {

           String Column1 = row.getString(1);
           long Column2 = row.getLong(2); 
           Get get = new Get(Bytes.toBytes(row.getString(0)));  
               Result result = table.get(get);
               if (!result.isEmpty()) {
                   Cell cell = result.getColumnLatestCell(Bytes.toBytes("cf1"), Bytes.toBytes("Column2"));
                   Column2 += Bytes.toLong(cell.getValueArray(),cell.getValueOffset());
                 }
            Put put = new Put(Bytes.toBytes(row.getString(0)));
            put.add(Bytes.toBytes("cf1"), Bytes.toBytes("Column1"), Bytes.toBytes(Column1));
            put.add(Bytes.toBytes("cf1"), Bytes.toBytes("Column2"), Bytes.toBytes(Column2));
            return new Tuple2 < ImmutableBytesWritable, Put > (new ImmutableBytesWritable(), put);

        } catch (Exception e) {

            e.printStackTrace();
        }
        finally {
            table.close();
        }
            }
        return null;
       }
      });

     hbasePuts1.saveAsNewAPIHadoopDataset(HbaseConfigurationReader.initializeHbaseConfiguration());

    }

==============对于配置=============== 公共类HbaseConfigurationReader实现Serializable {

static Job newAPIJobConfiguration1 =null;
private static Configuration conf =null;
private static HTable table= null; 
private static HbaseConfigurationReader instance= null;

private static Log logger= LogFactory.getLog(HbaseConfigurationReader.class);

HbaseConfigurationReader（）抛出MasterNotRunningException，ZooKeeperConnectionException，ServiceException，IOException { initializeHbaseConfiguration（）; }

public static HbaseConfigurationReader getInstance（）抛出MasterNotRunningException，ZooKeeperConnectionException，ServiceException，IOException {

if (instance == null) {
    instance = new HbaseConfigurationReader();
}

return instance;

} public static Configuration initializeHbaseConfiguration（）抛出MasterNotRunningException，ZooKeeperConnectionException，ServiceException，IOException { 如果（CONF == NULL） { CONF = HBaseConfiguration.create（）; conf.set（＆＃34; hbase.zookeeper.quorum＆＃34;，＆＃34; localhost＆＃34;）; conf.set（＆＃34; hbase.zookeeper.property.clientPort＆＃34;，＆＃34; 2181＆＃34;）; HBaseAdmin.checkHBaseAvailable（CONF）; table = new HTable（conf，＆＃34; TEST＆＃34;）; conf.set（org.apache.hadoop.hbase.mapreduce.TableInputFormat.INPUT_TABLE，＆＃34; TEST＆＃34;）; 尝试{ newAPIJobConfiguration1 = Job.getInstance（conf）; newAPIJobConfiguration1.getConfiguration（）。set（TableOutputFormat.OUTPUT_TABLE，＆＃34; TEST＆＃34;）; newAPIJobConfiguration1.setOutputFormatClass（org.apache.hadoop.hbase.mapreduce.TableOutputFormat.class）; } catch（IOException e）{ e.printStackTrace（）; }

 }

 else
     logger.info("Configuration comes null"); 

return newAPIJobConfiguration1.getConfiguration();

} }

Hbase Upsert with Spark

2 个答案: