我只是HBase的初学者。我想将RDBMS表迁移到HBase。
RDBMS中的表模式有点像这样:
Field Type Collation Null Key Default Extra Privileges Comment
--------------- ---------------- ----------------- ------ ------ ------- ------------ -- ------------------------------- -------
id int(16) unsigned (NULL) NO PRI (NULL) auto_increment select,insert,update,references
user_id varchar(64) latin1_swedish_ci NO MUL (NULL) select,insert,update,references
type_id int(11) (NULL) NO (NULL) select,insert,update,references
application_id int(16) unsigned (NULL) YES MUL (NULL) select,insert,update,references
title varchar(128) latin1_swedish_ci YES (NULL) select,insert,update,references
body text latin1_swedish_ci YES (NULL) select,insert,update,references
posted_time datetime (NULL) YES (NULL) select,insert,update,references
template_params text latin1_swedish_ci YES (NULL) select,insert,update,references
count int(11) (NULL) YES (NULL) select,insert,update,references
reference_id int(16) (NULL) YES (NULL) select,insert,update,references
viewer_id varchar(64) latin1_swedish_ci YES (NULL) select,insert,update,references
这里body和templete有varchar格式的json数据。现在我想在HBase中为这个表创建模式。
对此数据执行的操作:
1. Activity retrival for a user id
2. Activity retrival for a viewer id
3. Activity retrival for particular type_id/particular type_id and user_id.
4. Activity retrival made after t time.
适用于此的架构是什么?
答案 0 :(得分:0)
4. Activity retrival made after t time.
这不会是一个问题; HBase使用时间戳存储所有内容,您可以在时间t之后查询所有条目。
对于1,2和3,您是否正在尝试快速访问?如果是这样,我建议创建三个单独的表来存储数据 - 是的,有冗余,但查询会很快。
您可以使用HappyBase Python库对此进行编码,如下所示:
con = happybase.Connection()
user = conn.table('user')
viewer = conn.table('viewer')
type_user = conn.table('type_user')
def insert (user_id, viewer_id, type_id):
user.put (user_id, {'viewer_id': viewer_id, 'type_id': type_id})
viewer.put (viewer_id, {'user_id': user_id, 'type_id': type_id})
type_user.put (type_id + user_id, {'viewer_id': viewer_id})
def get_user (user_id):
return user.row(user_id)
def get_viewer (viewer_id):
return viewer.row(viewer_id)
def get_type_user (type_id, user_id):
if user_id == "":
rowkey = type_id
else
rowkey = type_id + user_id
# Note that we use a scan here to match only type_id if it exists
return type_user.scan(row_prefix=rowkey)