Question

如果要从用于存储数据的pyspark代码中的命名空间/ hbase中不存在，我想创建新的hbase表，有人可以帮助我完成此任务吗？

Answer 1

我认为最简单的方法是使用happybase。你可以在这里找到文件 happybase。这是下面的例子

hbase(main):001:0> list
TABLE                                                                                                                                                            
emp                                                                                                                                                              
1 row(s) in 0.7750 seconds

=> ["emp"]

只有一个表，我将使用Spark创建一个名为my_table的新表

>>> import happybase
>>> host = 'your host'
>>> connection = happybase.Connection(host = host) #not specify port
>>> connection.create_table(
...     'my_table',
...     {'col1': dict(), # it uses defaults, if you want you can define column definitions
...      'col2': dict(),
...      'col3': dict()
...     }
... )

然后检查hbase

hbase(main):002:0> list
TABLE                                                                                                                                                            
emp                                                                                                                                                              
my_table                                                                                                                                                         
2 row(s) in 0.0660 seconds

=> ["emp", "my_table"]

新表已创建。您也可以通过happybase在Spark中读取表。

>>> import happybase
>>> host = 'your host'
>>> connection = happybase.Connection(host = host) #not specify port
>>> table = connection.table('emp')
>>> table.row('1')
{b'personal data:name': b'raju'}

如何使用pyspark在hbase中创建表？

1 个答案: