Question

我有一个hive外部表，按年，月和小时分区。

PARTITIONED BY ( 
  `year` int, 
  `month` int, 
  `day` int, 
  `hour` int)
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.SequenceFileInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION
    'hdfs://path/to/data'

数据存在于

等目录中

2014/05/10/07/00

2014/05/10/07/01

...

2014/05/10/07/22

2014/05/10/07/23

我得到结果当我使用以下选择数据时：

Select * from my_table where year=2014 and month="05" and day="07" and hour="03"

但我希望能够查询以零开头的值的引号。目前，以下两个例子不起作用：

Select * from my_table where year=2014 and month=05 and day=07 and hour=03
Select * from my_table where year=2014 and month=5 and day=7 and hour=3

我该如何支持？（而不是更改目录，以便在单个数字值上没有零前缀）。

谢谢，

盖

Answer 1

在我进入答案之前，这确实涉及更改目录名称，但它确实会使查询变得简单。

我们的分区有类似的结构，但不是使用名称是 2014/05/10/07/22 ，我们使用它像 2014/201405 / 20140510/07 / 20140510.22 即可。基本上分区是：

 PARTITIONED BY 
  (
  years bigint,
  months bigint,
  days bigint,
  hours float
  )

现在发挥使用它的优势：

问题中提到的查询：

Select * from my_table where year=2014 and month=05 and day=07 and hour=03

新分区后

Select * from my_table where hour = 20140507.03

还可以直接运行其他日期和月份的查询，而无需明确指定月份和年份。

将hive分区映射到某个位置

1 个答案: