使用Flume Serializer生成复合hbase rowkey

时间:2016-06-08 07:38:00

标签: hadoop hbase interceptor flume serializer

我的GIS数据看起来像这样 -

'111, 2011-02-01 20:30:30, 116.50443, 40.00951'  
'111, 2011-02-01 20:30:31, 116.50443, 40.00951'  
'112, 2011-02-01 20:30:30, 116.58197, 40.06665'  
'112, 2011-02-01 20:30:31, 116.58197, 40.06665'  

第一列是driver_id,第二列是timestamp,第三列是longitude&第四是latitude

我使用Flume& amp ;;我正在摄取此类数据。我的接收器是HBase(类型 - AsyncHBaseSink) 默认情况下,HBase将rowkey指定为第一列(如111)。我想创建一个复合rowkey(比如前两列的组合111_2011-02-01 20:30:30) 我尝试将所需的更改添加到' AsyncHbaseLogEventSerializer.java'但他们没有反映出来。

请建议我该怎么做。

1 个答案:

答案 0 :(得分:2)

复合键应该在AsyncHbaseSerializer

中工作

以下是示例代码段。

在班级privae List<PutRequest> puts = null;

宣布
 /**
     * Method joinRowKeyContent. (with EMPTY string separation)
     * 
      * Joiner is google guava class
     * @param objArray Object...
     * 
     * @return String
     */
    public static String joinRowKeyContent(Object... objArray) {
        return Joiner.on("").appendTo(new StringBuilder(), objArray).toString();
    }

 /**
     * Method preParePutRequestForBody.
     * 
     * @param rowKeyBytes
     * @param timestamp
     */
    private void preParePutRequest(final byte[] rowKeyBytes, final long timestamp) {
        // Process 

            LOG.debug("Processing ..." + Bytes.toString(rowKeyBytes));

        final PutRequest putreq = new PutRequest(table, rowKeyBytes, colFam, Bytes.toBytes("yourcolumn"), yourcolumnasBytearray, timestamp);
        puts.add(putreq);
    }
  

你的行动方法看起来像......

  @Override
        public List<PutRequest> getActions() {
//create rowkey like this
    final String rowKey = joinRowKeyContent(driver_id, timestamp, longitude , latitude);

    // call prepare put requests method here 
    final byte[] rowKeyBytes = Bytes.toBytes(rowKey);
                puts.clear();
     preParePutRequest(rowKeyBytes ,<timestamp>)
            return puts;
        }