Question

我在Hive上有几个表，我的查询试图检索过去x天的数据。当我使用直接日期时，Hive正在修剪分区，但是在使用公式时正在进行全表扫描。

select *
from   f_event
where  date_key > 20160101;

scanned partitions..

s3://...key=20160102 [f]
s3://...key=20160103 [f]
s3://...key=20160104 [f]

如果我使用公式，比如说，获取过去4周的数据

Select count(*)
From    f_event f
Where  date_key  > from_unixtime(unix_timestamp()-2*7*60*60*24, 'yyyyMMdd')

这是扫描表格中的所有分区。

环境：Hadoop 2.6.0，EMR，Hive on S3，Hive 1.0.0

Answer 1

当过滤表达式包含unix_timestamp()等非确定性函数时，Hive不会触发分区修剪。

the discussion中提到了一个很好的理由：

想象一下你遇到的情况：

WHERE partition_column = f(unix_timestamp()) AND ordinary_column = f(unix_timestamp)。

谓词的右侧必须在地图时评估，   而你假设左手边应该在   编译时间，这意味着你有两个不同的值   unix_timestamp（）浮动，只能结束。

计算列上的Hive分区修剪

1 个答案: