使用命令:
describe formatted my_table partition my_partition
我们可以在my_partition
中列出包含分区my_table
的hdfs位置的元数据。但是我们怎样才能获得2列的输出:
Partition | Location
会列出my_table
中的所有分区及其hdfs位置吗?
答案 0 :(得分:3)
查询Metastore。
<强>蜂房强>
create table mytable (i int) partitioned by (dt date,type varchar(10))
;
alter table mytable add
partition (dt=date '2017-06-10',type='A')
partition (dt=date '2017-06-11',type='A')
partition (dt=date '2017-06-12',type='A')
partition (dt=date '2017-06-10',type='B')
partition (dt=date '2017-06-11',type='B')
partition (dt=date '2017-06-12',type='B')
;
Metastore (MySQL)
select p.part_name
,s.location
from metastore.DBS as d
join metastore.TBLS as t
on t.db_id =
d.db_id
join metastore.PARTITIONS as p
on p.tbl_id =
t.tbl_id
join metastore.SDS as s
on s.sd_id =
p.sd_id
where d.name = 'default'
and t.tbl_name = 'mytable'
;
+----------------------+----------------------------------------------------------------------------------+
| part_name | location |
+----------------------+----------------------------------------------------------------------------------+
| dt=2017-06-10/type=A | hdfs://quickstart.cloudera:8020/user/hive/warehouse/mytable/dt=2017-06-10/type=A |
| dt=2017-06-11/type=A | hdfs://quickstart.cloudera:8020/user/hive/warehouse/mytable/dt=2017-06-11/type=A |
| dt=2017-06-12/type=A | hdfs://quickstart.cloudera:8020/user/hive/warehouse/mytable/dt=2017-06-12/type=A |
| dt=2017-06-10/type=B | hdfs://quickstart.cloudera:8020/user/hive/warehouse/mytable/dt=2017-06-10/type=B |
| dt=2017-06-11/type=B | hdfs://quickstart.cloudera:8020/user/hive/warehouse/mytable/dt=2017-06-11/type=B |
| dt=2017-06-12/type=B | hdfs://quickstart.cloudera:8020/user/hive/warehouse/mytable/dt=2017-06-12/type=B |
+----------------------+----------------------------------------------------------------------------------+
答案 1 :(得分:0)
如果没有必要以很好的表格格式获取信息-并且您无权访问HMS数据库,则可能需要运行explain extended
:
explain extended select * from default.mytable;
然后您可以提取基本信息partition values
和location
。
root@ubuntu:/home/sathya# hive -e "explain extended select * from default.mytable;" | grep location
OK
location hdfs://localhost:9000/user/hive/warehouse/mytable/dt=2017-06-10/type=A
location hdfs://localhost:9000/user/hive/warehouse/mytable
location hdfs://localhost:9000/user/hive/warehouse/mytable/dt=2017-06-10/type=B
location hdfs://localhost:9000/user/hive/warehouse/mytable
location hdfs://localhost:9000/user/hive/warehouse/mytable/dt=2017-06-11/type=A
location hdfs://localhost:9000/user/hive/warehouse/mytable
location hdfs://localhost:9000/user/hive/warehouse/mytable/dt=2017-06-11/type=B
location hdfs://localhost:9000/user/hive/warehouse/mytable
location hdfs://localhost:9000/user/hive/warehouse/mytable/dt=2017-06-12/type=A
location hdfs://localhost:9000/user/hive/warehouse/mytable
location hdfs://localhost:9000/user/hive/warehouse/mytable/dt=2017-06-12/type=B
location hdfs://localhost:9000/user/hive/warehouse/mytable
答案 2 :(得分:0)
从我的角度来看,最好的解决方案是通过 Thrift 协议从 Hive Metastore 获取此信息。
如果您使用 python 编写代码,则可以使用 hmsclient 库:
蜂巢cli:
hive> create table test_table_with_partitions(f1 string, f2 int) partitioned by (dt string);
OK
Time taken: 0.127 seconds
hive> alter table test_table_with_partitions add partition(dt=20210504) partition(dt=20210505);
OK
Time taken: 0.152 seconds
Python 命令行:
>>> from hmsclient import hmsclient
>>> client = hmsclient.HMSClient(host='hive.metastore.location', port=9083)
>>> with client as c:
... all_partitions = c.get_partitions(db_name='default',
... tbl_name='test_table_with_partitions',
... max_parts=24 * 365 * 3)
...
>>> print([{'dt': part.values[0], 'location': part.sd.location} for part in all_partitions])
[{'dt': '20210504',
'location': 'hdfs://hdfs.master.host:8020/user/hive/warehouse/test_table_with_partitions/dt=20210504'},
{'dt': '20210505',
'location': 'hdfs://hdfs.master.host:8020/user/hive/warehouse/test_table_with_partitions/dt=20210505'}]
如果您将 Airflow 与 apache.hive
extra 一起安装,您创建 hmsclient
使用来自 Airflow Connections 的数据非常容易:
hive_hook = HiveMetastoreHook()
with hive_hook.metastore as hive_client:
... your code goes here ...