Hive Optimizer是否在优化视图查询时考虑视图定义?

时间:2017-07-06 07:40:43

标签: sql hadoop hive query-optimization

我有这个模式(通过表格和视图的DDL给出):

hive> create table t_realtime(cust_id int, name string, status string, active_flag int);

hive> create table t_hdfs(cust_id int, name string, status string, active_flag int);

hive> create view t_inactive as select * from t_hdfs where active_flag=0;

hive> create view t_view as select * from t_realtime union all select * from t_inactive;

如果我按如下方式触发查询:

hive> select * from t_view where active_flag = 1;

理想情况下,此查询不应访问t_inactive视图或t_hdfs,因为t_inactive的视图定义本身有active_flag = 0,查询谓词有active_flag = 1 。但是,默认情况下,它不会消除此联合视图的t_inactive部分。

有没有为此类hive查询实现此目的?也许有一些hive优化器参数或提示?

1 个答案:

答案 0 :(得分:1)

hive> explain extended select * from t_view where active_flag = 1;
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: t_realtime
          properties:
            insideView TRUE
          GatherStats: false
          Filter Operator
            isSamplingPred: false
            predicate: (active_flag = 1) (type: boolean)
            Select Operator
              expressions: cust_id (type: int), name (type: string), status (type: string), 1 (type: int)
              outputColumnNames: _col0, _col1, _col2, _col3
              ListSink

这是在昨天的主线(d68630b6ed25884a76030a9073cd864032ab85c2)上测试的。如您所见,它只扫描t_realtime并按下谓词active_flag = 1您的特定安装是否会这样做,取决于您使用的版本。本主题不仅可以在Hive上进行,还可以在Calcite上使用(由Hive使用)。