Hive:按整数列的一部分进行分区

时间:2017-03-22 11:27:33

标签: hive

我想创建一个外部Hive表,按记录类型和日期(年,月,日)进行分区。一个复杂因素是我的数据文件中的日期格式是单值整数yyyymmddhhmmss而不是所需的日期格式yyyy-mm-dd hh:mm:ss。 我可以根据单个数据值指定3个新分区列吗?类似下面的例子(不起作用)

create external table cdrs (
record_id int, 
record_detail tinyint,
datetime_start int
)
partitioned by (record_type int, createyear=datetime_start(0,3) int, createmonth=datetime_start(4,5) int, createday=datetime_start(6,7) int)
row format delimited 
fields terminated by '|' 
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1");

1 个答案:

答案 0 :(得分:0)

如果您希望能够使用MSCK REPAIR TABLE根据目录结构为您添加分区,则应使用以下约定:

  • 目录的嵌套应该与分区列的顺序相匹配。
  • 目录名称应为{partition column name}={value}

如果您打算手动添加分区,则结构没有意义 任何设置值都可以与任何目录耦合。例如 -

alter table cdrs  
add if not exist partition (record_type='TYP123',createdate=date '2017-03-22') 
location 'hdfs://nameservice1/tmp/sbx_unleashed.db/2017MAR22_OF_TYPE_123';

假设目录结构 -

.../sbx_unleashed.db/record_type=.../createyear=.../createmonth=.../createday=.../

e.g。

.../sbx_unleashed.db/record_type=TYP123/createyear=2017/createmonth=03/createday=22/
create external table cdrs 
(
   record_id      int
  ,record_detail  tinyint
  ,datetime_start int
)
partitioned by (record_type int,createyear int, createmonth tinyint, createday tinyint)
row format delimited 
fields terminated by '|' 
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1")
;

假设目录结构 -

.../sbx_unleashed.db/record_type=.../createdate=.../

e.g。

.../sbx_unleashed.db/record_type=TYP123/createdate=2017-03-22/
create external table cdrs 
(
   record_id      int
  ,record_detail  tinyint
  ,datetime_start int
)
partitioned by (record_type int,createdate date)
row format delimited 
fields terminated by '|' 
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1")
;