我正在蜂巢中创建一个视图,该视图将两个表合并在一起,并具有大量数据。有没有一种方法可以传递过滤器参数以在配置单元中查看,以便也将其应用于表。 我有
CREATE VIEW abc
AS
SELECT * FROM
(SELECT * FROM table_a
UNION
SELECT * table_b) temp;
如果我运行类似SELECT * FROM abc WHERE day='2018-10-22'
的东西
它只应在所选日期返回工会,例如
SELECT * FROM table _a WHERE day='2018-10-22' UNION
SELECT * FROM table _b WHERE day='2018-10-22'
如何创建视图以执行此操作。
答案 0 :(得分:1)
出于优化目的,无需显式添加过滤器。查询优化器可以下推谓词。看看这个
CREATE TABLE `t5`(`a` string);
CREATE TABLE `t6`(`a` string);
CREATE VIEW v1
AS
SELECT * FROM
(
SELECT * FROM t5
UNION ALL
SELECT * from t6
) temp;
这是查询select * from v1 where a = "b"
的解释,因为您可以看到有2个独立的表扫描,并且每个谓词都被应用。如果此时Hive提取所有数据并最后进行过滤,那真是令人失望:)
Explain
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: t5
filterExpr: (a = 'b') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (a = 'b') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: 'b' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
TableScan
alias: t6
filterExpr: (a = 'b') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: (a = 'b') (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Union
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: 'b' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink