Question

我在userId上有一个hive表，我的select查询在where子句中包含userId但是hive正在进行全表扫描。 hive.enforce.bucketing是真的在这种情况下，为什么不能利用分组功能，是否有任何配置可以启用它？

表格结构

userId int,
name int,
address String,
cell int,
......
......
......
......
CLUSTERED BY  (userId) SORTED BY (userId) INTO 20 BUCKETS

选择查询

select cell from <table> where userId=<userId>

Answer 1

select cell from <table> TABLESAMPLE(BUCKET <n> OUT OF 20 ON userId) usertable where userId = <userId>