Hbase反向扫描

时间:2019-07-04 16:27:25

标签: java hbase

我的数据密钥以trade<date><index>格式存储

trade1907030001
trade1907030002
trade1907040001
trade1907040002
trade1907050001
trade1907050002

实施“反向”扫描以遍历当天或从特定行到一天结束甚至两个精确交易之间的所有交易的正确方法是什么?

Scan scan = new Scan();
scan.setReversed(true);
scan.setStartRow(Bytes.unsignedCopyAndIncrement(Bytes.toBytes(trade + day)));
scan.setStopRow(Bytes.toBytes(trade + day));

请记住,根据documentatin,开始行包含所有内容,结束行包含唯一内容,我们会错过当日最古老的交易。如果该行实际上是交易行交易,则我们不能增加密钥,否则将提取下一个交易。它开始是有条件的。我如何才能使其在不同情况下可靠运行?

2 个答案:

答案 0 :(得分:1)

您可以使用:

Scan scan = new Scan();
scan.setReversed(true);
scan.setRowPrefixFilter(Bytes.toBytes(trade + day));

它会自动确保不会忽略第一笔和最后一笔交易。

来源:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setRowPrefixFilter-byte:A-

答案 1 :(得分:0)

这是扫描的实际工作方式(在hbase shell v1.2.0-cdh5.13.3中进行了测试):

trade171020S00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020
trade171018B00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020
trade171020S00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020
trade171113B00001                                          column=inp:data_as_of_date, timestamp=1511993729979, value=20171114
trade171114S00001                                          column=inp:data_as_of_date, timestamp=1511993729979, value=20171114

scan 'namespace:table', {COLUMNS=>['inp:data_as_of_date'], STARTROW=>'trade171018B00001', ENDROW=>'trade171113B00001'}
ROW                                                                  COLUMN+CELL
trade171018B00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020
trade171020S00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020

scan 'namespace:table', {COLUMNS=>['inp:data_as_of_date'], STARTROW=>'trade171113B00001', ENDROW=>'trade171018B00001', REVERSED=>true}
ROW                                                                  COLUMN+CELL
trade171113B00001                                          column=inp:data_as_of_date, timestamp=1511993729979, value=20171114
trade171020S00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020

scan 'namespace:table', {COLUMNS=>['inp:data_as_of_date'], STARTROW=>'trade171018', ENDROW=>'trade171113'}
ROW                                                                  COLUMN+CELL
trade171018B00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020
trade171020S00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020

scan 'namespace:table', {COLUMNS=>['inp:data_as_of_date'], STARTROW=>'trade171113', ENDROW=>'trade171018', REVERSED=>true}
ROW                                                                  COLUMN+CELL
trade171020S00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020
trade171018B00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020

scan 'namespace:table', {COLUMNS=>['inp:data_as_of_date'], ROWPREFIXFILTER=>'trade171113'}
ROW                                                                  COLUMN+CELL
trade171113B00001                                          column=inp:data_as_of_date, timestamp=1511993729979, value=20171114

scan 'namespace:table', {COLUMNS=>['inp:data_as_of_date'], ROWPREFIXFILTER=>'trade171113', REVERSED=>true}
ROW                                                                  COLUMN+CELL
0 row(s) in 0.2300 seconds

如果开始行和结束行短于表行键,则后续操作将按预期进行

Scan scan = new Scan();
scan.setStartRow(Bytes.toBytes(trade + day));
scan.setStopRow(Bytes.unsignedCopyAndIncrement(Bytes.toBytes(trade + day)));

Scan scan = new Scan();
scan.setReversed(true);
scan.setStartRow(Bytes.unsignedCopyAndIncrement(Bytes.toBytes(trade + day)));
scan.setStopRow(Bytes.toBytes(trade + day));

如果开始行和结束行的长度可以与表行键的长度相同,则后续操作将按预期进行

Scan scan = new Scan();
scan.setStartRow(createKey("S", productSymbolId, YYMMDD.print(fromDate)));
scan.setStopRow(createNextKey("S", productSymbolId, YYMMDD.print(toDate)));

Scan scan = new Scan();
scan.setReversed(true);
scan.setStartRow(createKeyBeforeNext("A", stripSpaces(accountId), YYMMDD.print(toDate)));
scan.setStopRow(createKeyBefore("A", stripSpaces(accountId), YYMMDD.print(fromDate)));

其中

key === 54686973697361746573746b6579
next === 54686973697361746573746b657a
before === 54686973697361746573746b6578ffffffffffffffffff
beforeNext === 54686973697361746573746b6579ffffffffffffffffff

实现

/**
 * <h4>usage</h4>
 * 
 * <pre>
 * Scan scan = new Scan();
 * scan.setStartRow(createKey("S", productSymbolId, YYMMDD.print(fromDate)));
 * scan.setStopRow(createNextKey("S", productSymbolId, YYMMDD.print(toDate)));
 *
 * Scan scan = new Scan();
 * scan.setReversed(true);
 * scan.setStartRow(createKeyBeforeNext("A", stripSpaces(accountId), YYMMDD.print(toDate)));
 * scan.setStopRow(createKeyBefore("A", stripSpaces(accountId), YYMMDD.print(fromDate)));
 * </pre>
 * 
 * <h4>spec</h4>
 * 
 * <pre>
 * key === 54686973697361746573746b6579
 * next === 54686973697361746573746b657a
 * before === 54686973697361746573746b6578ffffffffffffffffff
 * beforeNext === 54686973697361746573746b6579ffffffffffffffffff
 * </pre>
 * 
 * @see #createKeyBefore(String...)
 * @see #createKeyBeforeNext(String...)
 * @see #createNextKey(String...)
 */
// similar to Bytes.add(final byte [] a, final byte [] b, final byte [] c) {
public static byte[] createKey(String... parts) {
    byte[][] bytes = new byte[parts.length][];
    int size = 0;
    for (int i = 0; i < parts.length; i++) {
        bytes[i] = toBytes(parts[i]);
        size += bytes[i].length;
    }
    byte[] result = new byte[size];
    for (int i = 0, j = 0; i < bytes.length; i++) {
        arraycopy(bytes[i], 0, result, j, bytes[i].length);
        j += bytes[i].length;
    }
    return result;
}

/**
 * Create the next row
 * 
 * <pre>
 * key === 54686973697361746573746b6579
 * next === 54686973697361746573746b657a
 * </pre>
 * 
 * @see #createKey(String...)
 */
public static byte[] createNextKey(String... parts) {
    return unsignedCopyAndIncrement(createKey(parts));
}

/**
 * Create the closest row before
 * 
 * <pre>
 * key === 54686973697361746573746b6579
 * before === 54686973697361746573746b6578ffffffffffffffffff
 * </pre>
 * 
 * @see #createKey(String...)
 */
public static byte[] createKeyBefore(String... parts) {
    return createClosestRowBefore(createKey(parts));
}

/**
 * Create the closest row before the next row
 * 
 * <pre>
 * key === 54686973697361746573746b6579
 * beforeNext === 54686973697361746573746b6579ffffffffffffffffff
 * </pre>
 * 
 * @see #createKey(String...)
 */
public static byte[] createKeyBeforeNext(String... parts) {
    return createClosestRowBefore(createNextKey(parts));
}

// from hbase sources ClientScanner.createClosestRowBefore(byte[] row)
private static byte[] createClosestRowBefore(byte[] row) {
    if (row == null)
        throw new IllegalArgumentException("The passed row is empty");
    if (Bytes.equals(row, HConstants.EMPTY_BYTE_ARRAY))
        return MAX_BYTE_ARRAY;
    if (row[row.length - 1] == 0)
        return Arrays.copyOf(row, row.length - 1);
    byte[] closestFrontRow = Arrays.copyOf(row, row.length);
    closestFrontRow[row.length - 1] = (byte) ((closestFrontRow[row.length - 1] & 0xff) - 1);
    closestFrontRow = Bytes.add(closestFrontRow, MAX_BYTE_ARRAY);
    return closestFrontRow;
}