我创建了一个hbase表
create 'user_data_table','personal_data','professional_data';
然后我将几条记录插入表中
put 'user_data_table','user1','personal_data:Location','IL'
put 'user_data_table','user1','personal_data:FName','Deb'
put 'user_data_table','user1','personal_data:LName','D'
put 'user_data_table','user1','professional_data:dept','IT'
put 'user_data_table','user1','professional_data:salary','2000'
put 'user_data_table','user2','personal_data:FName','CH'
put 'user_data_table','user2','personal_data:LName','AK'
put 'user_data_table','user2','professional_data:dept','IT'
put 'user_data_table','user2','professional_data:salary','80000'
我创建了一个快照snapshot 'user_data_table', 'snapshot-day-1'
然后我插入/更新了如下记录。
put 'user_data_table','user1','personal_data:Location','VA'
put 'user_data_table','user1','professional_data:salary','3000'
当我尝试在我的hive表中引用快照时,我没有得到旧数据。相反,我每次都会收到最新的数据。不知道为什么它的表现如此。使用hbase快照引用创建配置单元表的命令如下所示。
CREATE EXTERNAL TABLE if not exists hbase_user_data_snapshot1_table(key string, Location string,FName string,LName string, dept string,salary string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary",
"hive.hbase.snapshot.name"="snapshot-day-1")
TBLPROPERTIES ("hbase.table.name" = "user_data_table");
答案 0 :(得分:0)
快照意味着(1)不会从现有的HFile中删除任何信息;(2)这些HFiles的内容可以按需重建(隐藏任何已创建的内容)已被附加)
但HIVE-6584表示......
绕过在线区域服务器API可提供良好的性能 提升全扫描
...所以也许他们选择“绕过”时间点恢复部分,并且只是将快照用作直接访问HFile的后门。包括自快照创建以来附加的任何内容。也许
答案 1 :(得分:0)
DDL错了。正确的方法如下。
CREATE EXTERNAL TABLE if not exists hbase_user_data_snapshot2_table(key string, Location string,FName string,LName string, dept string,salary string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary")
TBLPROPERTIES ("hive.hbase.snapshot.name"="snapshot-day-2");
注意TBLPROPERTIES。我们不引用该表而是引用快照名称。
答案 2 :(得分:0)
你需要在选择之前设置 Hive 变量
CREATE EXTERNAL TABLE if not exists hbase_user_data_snapshot2_table(key string, Location string,FName string,LName string, dept string,salary string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary")
TBLPROPERTIES ("hive.hbase.table.name"="xxx");
-- a table may be have many snapshot,so we configure it before select,
-- that is make sense
-- and if you snapshot file is store in a special path, please use
-- SET hive.hbase.snapshot.restoredir= xxxx; to configure
SET hive.hbase.snapshot.name=snapshot-day-2;
select * from hbase_user_data_snapshot2_table;