我的任务是获取HBase表中数据的时间戳。如果我在hbase shell中的表上执行scan
,我可以看到给定行的时间戳,例如
scan 'mytable', {LIMIT => 1}
ROW COLUMN+CELL
00001000715ce3d569ee256153d column=0:, timestamp=1326362691000, value=1320073315600x600
f31db629b
1 row(s) in 1.9800 seconds
如果我尝试在grunt shell中加载此表中的一些数据,那么我看不到时间戳,只看到值。
tableinput = LOAD 'hbase://imagestore-new'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('0: ', '-loadKey true')
AS (id:bytearray, thingy:chararray);
illustrate tableinput;
这给了我:
--------------------------------------------------------------------------------
| tableinput | id:bytearray | thingy:chararray |
--------------------------------------------------------------------------------
| | 0000bizrad8156b98bffa60d8968fba0f326 | {=1348461029160x130} |
--------------------------------------------------------------------------------
严重缺乏关于如何在猪身上使用HBaseStorage的信息,我受到了阻碍;我唯一能找到的是API条目(http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html)。我怀疑有一种方法可以在调用HBaseStorage时将其添加为配置,类似于'-loadKey true'
,但我不知道在哪里可以找到这些信息。请帮忙!
答案 0 :(得分:2)
你实际上现在无法做到这一点。这是可用键的当前列表(您可以在HBaseStorage的构造函数javadoc中看到它们):
/**
* Constructor. Construct a HBase Table LoadFunc and StoreFunc to load or store.
* @param columnList
* @param optString Loader options. Known options:<ul>
* <li>-loadKey=(true|false) Load the row key as the first column
* <li>-gt=minKeyVal
* <li>-lt=maxKeyVal
* <li>-gte=minKeyVal
* <li>-lte=maxKeyVal
* <li>-limit=numRowsPerRegion max number of rows to retrieve per region
* <li>-delim=char delimiter to use when parsing column names (default is space or comma)
* <li>-ignoreWhitespace=(true|false) ignore spaces when parsing column names (default true)
* <li>-caching=numRows number of rows to cache (faster scans, more memory).
* <li>-noWAL=(true|false) Sets the write ahead to false for faster loading.
* <li>-minTimestamp= Scan's timestamp for min timeRange
* <li>-maxTimestamp= Scan's timestamp for max timeRange
* <li>-timestamp= Scan's specified timestamp
* <li>-caster=(HBaseBinaryConverter|Utf8StorageConverter) Utf8StorageConverter is the default
* To be used with extreme caution, since this could result in data loss
* (see http://hbase.apache.org/book.html#perf.hbase.client.putwal).
* </ul>
* @throws ParseException
* @throws IOException
*/
正如您所看到的,已经添加了按时间戳限制扫描,但没有人需要实际返回它。我认为实施起来并不困难。打开Jira?也许甚至发贴补丁? :)