HBase部分扫描怎么办?

时间:2016-11-06 05:34:50

标签: hadoop hive hbase

我有HBase表,其中包含大约1000万条记录。 我有三个关于HBase的问题

  1. 扫描10百万条记录需要多长时间?
  2. 我应该参加HIVE HBase整合吗?
  3. 如果我在每行中只添加一个像FL01这样的标识符,如何执行部分范围扫描?
  4.   

    4294970043 | 1个
      column = cf:SegmentMultipleFundbDescription,timestamp = 1478316937790,   值= 4294970043 | 1
      柱= CF:SegmentMultipleFundbDescription_languageId,   时间戳= 1478316937790,价值= 505074 4294970043 | 1
      column = cf:StatementTypeCode,timestamp = 1478316937790,value = FTN   4294970929 | 1栏= cf:FFAction,   timestamp = 1478316937790,value = I 4294970929 | 1
      column = cf:FileName,timestamp = 1478316937790,   值= Fundamental.FinancialLineItem.FinancialLineItem.ThirdPartyPrivate.FTN.1.2                                               016-07-15-2108.Full 4294970929 | 1 column = cf:FilePartition,   timestamp = 1478316937790,value = ThirdPartyPrivate 4294970929 | 1
      column = cf:FilePartitionLocation,timestamp = 1478316937790,value = FTN   4294970929个| 1个
      柱= CF:FinancialConceptCodeGlobalSecondary,   timestamp = 1478316937790,value = 4294970929 | 1
      柱= CF:FinancialConceptCodeGlobalSecondaryId,   timestamp = 1478316937790,value = 4294970929 | 1
      column = cf:FinancialConceptGlobal,timestamp = 1478316937790,value = METL   4294970929个| 1个
      column = cf:FinancialConceptGlobalId,timestamp = 1478316937790,   值= 3015071

2 个答案:

答案 0 :(得分:0)

HBASE将执行FTS,除非您提供启动和停止行键。因此,如果标识符是行键的一部分,并且行键是固定的,那么您可以尝试设置开始和停止行键,否则尝试使用fuzzyfilter。否则,如果标识符不是行键HBASE的一部分,则会执行FTS。

扫描所花费的时间实际上取决于各种因素,例如行键大小,CF数量,列限定符数...

答案 1 :(得分:0)

假设您的键是字符串并且行以列表中的地图形式返回,那么您的范围扫描应该类似于下面的代码。

public List<Map<String,byte[]>> rangeFetch(String valueFrom, String valueTo, String[] columns, int maxrows) {
    ArrayList<Map<String,byte[]>> rst = new ArrayList<Map<String,byte[]>>();
    Scan scn = new Scan();
    scn.setStartRow(valueFrom.getBytes());
    scn.setStopRow (valueTo.getBytes());
    for (String colName : columns) {
        scn.addColumn(colName.getBytes());
    }
    ResultScanner rsc = null;
    int rowCount = 0;
    try {
        rsc = oTbl.getScanner(scn);
        for (Result res=rsc.next(); res!=null && rowCount<maxrows; res=rsc.next()) {
            Map<String,byte[]> row = new HashMap<String,byte[]>();
            for (String colName : columns) {
                KeyValue kvl = res.getColumnLatest("columnFamilyName".getBytes(), colName.getBytes());
                if (kvl!=null) {
                    if (kvl.getValue()!=null)
                        row.put(colName, kvl.getValue());
                }
            } // next
            rst.add(row);             
        } // next
    } finally {
        if (rsc!=null) rsc.close();
    }
    return rst;
}

然后用

调用它
List<Map<String,byte[]>> results = yourObj.rangeFetch("FL01"+"000000", "FL01"+"999999", new String[]{"column1","column2","column3"}, 10000);