Question

我是Hadoop和HBase的新手，试图学习并评估它是否可以用于我的用例。作为Java新手（我基本上是Perl / Unix和DB开发人员），我试图在可能的情况下在Hbase shell中获得解决方案。

我有一个HBase表（下面的架构），我正在尝试实现历史数据（可用于审计和分析）。

假设基本结构如下，

    rowkey 'cf1:id', 'cf1:price', 'cf1:user', 'cf1:timestamp'

现在， rowkey - 乐器或任何物体 id - 使用它来识别哪个col具有最新数据。第一个条目将有1作为其值，然后继续用户 - 更新数据的用户

e.g。

最初的数据看起来像，

    hbase(main):009:0> scan 'price_history'
    ROW  COLUMN+CELL                                                                                                                 
    row1        column=cf1:id, timestamp=1389020633920,value=1
    row1        column=cf1:pr, timestamp=1389020654614, value=109.45
    row1        column=cf1:us, timestamp=1389020668338, value=feed
    row2        column=cf1:id, timestamp=1389020687334, value=1
    row2        column=cf1:pr, timestamp=1389020697880, value=1345.65
    row2        column=cf1:us, timestamp=1389020708403, value=feed

现在假设row2或者工具2在同一天以新价格更新

    hbase(main):003:0> scan 'price_history'
    ROW                   COLUMN+CELL                        
    row1                 column=cf1:id, timestamp=1389020633920, value=1
    row1                 column=cf1:pr, timestamp=1389020654614, value=109.45
    row1                 column=cf1:us, timestamp=1389020668338, value=feed
    row2                   column=cf1:id, timestamp=1389020859674, value=2
    row2                 column=cf1:pr, timestamp=1389020697880, value=1345.65
    row2                 column=cf1:pr1, timestamp=1389020869856, value=200
    row2                 column=cf1:us, timestamp=1389020708403, value=feed
    row2                 column=cf1:us1, timestamp=1389020881601, value=user1`

如果您将ID更改为2以指示第二组数据是最新的。并添加了新值或列。

我想要的是，

    1) Can I fetch the value of columns id? i.e. the output should be 1 or 2 and not  all other attribs
    2) Based on the above o/p i will fetch the further data, but can I also have a search and o/p as value of rowkey? i.e. something like give me o/p of row having VALUE as row1 (I can have list of row1, row2, rown..)

请尽可能在HBase shell中提供帮助（也欢迎其他解决方案）

此外，如果任何架构师可以建议更好的解决方案来建模表，以跟踪价格的变化/版本也欢迎。

感谢。

Answer 1

如果没有做大量的管道输出并且轻击结果，那么在shell中很难做到这一点。 shell输出格式化也使得这很困难，因为它如何分解行。比写Java更轻量级的解决方案是用红宝石编写扫描仪。 HBase附带了jruby jar，可以让你执行ruby脚本。

include Java
import "org.apache.hadoop.hbase.client.Scan"
import "org.apache.hadoop.hbase.util.Bytes"
import "org.apache.hadoop.hbase.client.HTable"

config = HBaseConfiguration.create()
family = Bytes.toBytes("family-name")
qual = Bytes.toBytes("qualifier"
scan = Scan.new()
scan.addColumn(family, qualifier)

table = HTable.new(config, "table-name")
scanner = table.getScanner(scan)
scanner.each do |result|
   keyval = result.getColumnLatest(family, qualifier) 
   puts "#{Bytes.toDouble(keyval.getValue())}"
end

这应该让你非常接近，你可以向输出添加额外的数据，例如行键。要运行它，只需使用hbase org.jruby.Main your_ruby_file.rb

HBase shell - 检索（仅）列值（而不是列名）

1 个答案: