使用HBase shell扫描过滤器

时间:2011-08-31 11:10:55

标签: nosql hbase

是否有人知道如何根据某些扫描过滤器扫描记录,即:

column:something = "somevalue"

类似于this,但来自HBase shell?

6 个答案:

答案 0 :(得分:47)

试试这个。这有点难看,但对我有用。

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes
scan 't1', { COLUMNS => 'family:qualifier', FILTER =>
    SingleColumnValueFilter.new
        (Bytes.toBytes('family'),
         Bytes.toBytes('qualifier'),
         CompareFilter::CompareOp.valueOf('EQUAL'),
         SubstringComparator.new('somevalue'))
}

HBase shell将包含〜/ .irbrc中的所有内容,因此您可以在其中放置类似的东西(我不是Ruby专家,欢迎改进):

# imports like above
def scan_substr(table,family,qualifier,substr,*cols)
    scan table, { COLUMNS => cols, FILTER =>
        SingleColumnValueFilter.new
            (Bytes.toBytes(family), Bytes.toBytes(qualifier),
             CompareFilter::CompareOp.valueOf('EQUAL'),
             SubstringComparator.new(substr)) }
end

然后你可以在shell中说:

scan_substr 't1', 'family', 'qualifier', 'somevalue', 'family:qualifier'

答案 1 :(得分:29)

scan 'test', {COLUMNS => ['F'],FILTER => \ 
"(SingleColumnValueFilter('F','u',=,'regexstring:http:.*pdf',true,true)) AND \
(SingleColumnValueFilter('F','s',=,'binary:2',true,true))"}

可以找到更多信息here。请注意,多个示例位于附加的Filter Language.docx文件中。

答案 2 :(得分:8)

使用scan的FILTER参数,如使用帮助中所示:

hbase(main):002:0> scan

ERROR: wrong number of arguments (0 for 1)

Here is some help for this command:
Scan a table; pass table name and optionally a dictionary of scanner
specifications.  Scanner specifications may include one or more of:
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH,
or COLUMNS. If no columns are specified, all columns will be scanned.
To scan all members of a column family, leave the qualifier empty as in
'col_family:'.

Some examples:

  hbase> scan '.META.'
  hbase> scan '.META.', {COLUMNS => 'info:regioninfo'}
  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
  hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}

For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false).  By
default it is enabled.  Examples:

  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

答案 3 :(得分:5)

Scan scan = new Scan();
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);

//in case you have multiple SingleColumnValueFilters, 
you would want the row to pass MUST_PASS_ALL conditions
or MUST_PASS_ONE condition.

SingleColumnValueFilter filter_by_name = new SingleColumnValueFilter( 
                   Bytes.toBytes("SOME COLUMN FAMILY" ),
                   Bytes.toBytes("SOME COLUMN NAME"),
                   CompareOp.EQUAL,
                   Bytes.toBytes("SOME VALUE"));

filter_by_name.setFilterIfMissing(true);  
//if you don't want the rows that have the column missing.
Remember that adding the column filter doesn't mean that the 
rows that don't have the column will not be put into the 
result set. They will be, if you don't include this statement. 

list.addFilter(filter_by_name);


scan.setFilter(list);

答案 4 :(得分:5)

One of the filter is Valuefilter which can be used to filter all column values.

application: xxxxxxxxxxxxx version: 1 runtime: php55 api_version: 1 handlers: - url: / script: index.php - url: /profile script: profile.php

binary is one of the comparators used within the filter. You can use different comparators within the filter based on what you want to do.

You can refer following url: http://www.hadooptpoint.com/filters-in-hbase-shell/. It provides good examples on how to use different filters in HBase Shell.

答案 5 :(得分:0)

在查询结束时添加setFilterIfMissing(true)

hbase(main):009:0> import org.apache.hadoop.hbase.util.Bytes;
 import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
 import org.apache.hadoop.hbase.filter.BinaryComparator;
 import org.apache.hadoop.hbase.filter.CompareFilter;
 import org.apache.hadoop.hbase.filter. Filter;

 scan 'test:test8', { FILTER => SingleColumnValueFilter.new(Bytes.toBytes('account'),
      Bytes.toBytes('ACCOUNT_NUMBER'), CompareFilter::CompareOp.valueOf('EQUAL'),
      BinaryComparator.new(Bytes.toBytes('0003000587'))).setFilterIfMissing(true)}