我能够在HBase的hive中创建外部表,现在我需要创建一个具有可变列的外部表,这意味着HBase中的列不是针对特定表固定的,没有列和可以在数据插入时动态创建,处理这种情况的方法应该是什么。
总结:当HBase表中没有修复列数时,如何在配置单元中创建外部表。
提前致谢。
答案 0 :(得分:5)
在Hbase shell中创建表
create 'hbase_2_hive_names', 'id', 'name', 'age'
将数据加载到Hbase(输入文件必须是HDFS)
export HADOOP_CLASSPATH=$(/usr/local/hbase/bin/hbase classpath);$HADOOP_HOME/bin/hadoop jar /usr/local/hbase/hbase-0.94.1.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,id:id,name:fn,name:ln,age:age hbase_2_hive_names /var/data/samples/names.tsv
在Hive shell中创建外部表
CREATE EXTERNAL TABLE hbase_hive_names(hbid INT, id INT, fn STRING, ln STRING, age INT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,id:id,name:fn,name:ln,age:age") TBLPROPERTIES("hbase.table.name" = "hbase_2_hive_names");
答案 1 :(得分:0)
第一步:登录到HBase Shell
hbase shell
第二步:创建HBase表
hbase(main):001:0> create 'hbase_emp_table', [{NAME => 'per', COMPRESSION => 'SNAPPY'}, {NAME => 'prof', COMPRESSION => 'SNAPPY'} ]
Created table hbase_emp_table
Took 1.5417 seconds
=> Hbase::Table - hbase_emp_table
第3步:描述HBase表:
hbase(main):002:0> describe 'hbase_emp_table'
Table hbase_emp_table is ENABLED
hbase_emp_table
COLUMN FAMILIES DESCRIPTION
{NAME => 'per', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING =>
'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false',
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'SNAPPY', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
{NAME => 'prof', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING =
> 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false',
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'SNAPPY', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
2 row(s)
Took 0.1846 seconds
第4步:将数据插入HBase表
put 'hbase_emp_table','1','per:name','Ranga Reddy'
put 'hbase_emp_table','1','per:age','32'
put 'hbase_emp_table','1','prof:des','Senior Software Engineer'
put 'hbase_emp_table','1','prof:sal','50000'
put 'hbase_emp_table','2','per:name','Nishanth Reddy'
put 'hbase_emp_table','2','per:age','3'
put 'hbase_emp_table','2','prof:des','Software Engineer'
put 'hbase_emp_table','2','prof:sal','80000'
第5步:检查HBase表数据
hbase(main):012:0> scan 'hbase_emp_table'
ROW COLUMN+CELL
1 column=per:age, timestamp=1606304606241, value=32
1 column=per:name, timestamp=1606304606204, value=Ranga Reddy
1 column=prof:des, timestamp=1606304606269, value=Senior Software Engineer
1 column=prof:sal, timestamp=1606304606301, value=50000
2 column=per:age, timestamp=1606304606362, value=3
2 column=per:name, timestamp=1606304606338, value=Nishanth Reddy
2 column=prof:des, timestamp=1606304606387, value=Software Engineer
2 column=prof:sal, timestamp=1606304608374, value=80000
2 row(s)
Took 0.0513 seconds
Step6:使用蜂巢或蜂线登录到蜂巢外壳
hive
Step7:创建配置单元表
CREATE EXTERNAL TABLE IF NOT EXISTS hive_emp_table(id INT, name STRING, age SMALLINT, designation STRING, salary BIGINT)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,per:name,per:age,prof:des,prof:sal")
TBLPROPERTIES("hbase.table.name" = "hbase_emp_table");
第8步:选择配置单元表数据
hive> select * from hive_emp_table;
INFO : OK
+--------------------+----------------------+---------------------+-----------------------------+------------------------+
| hive_emp_table.id | hive_emp_table.name | hive_emp_table.age | hive_emp_table.designation | hive_emp_table.salary |
+--------------------+----------------------+---------------------+-----------------------------+------------------------+
| 1 | Ranga Reddy | 32 | Senior Software Engineer | 50000 |
| 2 | Nishanth Reddy | 3 | Software Engineer | 80000 |
+--------------------+----------------------+---------------------+-----------------------------+------------------------+
2 rows selected (17.401 seconds)