我有一个需要调整的查询。在Stack Overflow的许多优秀人员的帮助下,我做了很少的建议更改并且确实有效;但是我真的想了解Hive中的解释计划,并尝试自己调整查询。
查询 -
CREATE TABLE admin.FctPrfitAmt_rpt AS
SELECT * FROM admin.FctPrfitAmt t2
WHERE t2.scenario_id NOT exists (SELECT 1 from admin.FctPrfitAmt_incr t3 where t2.scenario_id = t3.scenario_id)
UNION ALL
SELECT * FROM admin.FctPrfitAmt_incr
解释计划:
STAGE DEPENDENCIES:
Stage-10 is a root stage
Stage-15 depends on stages: Stage-1, Stage-10, Stage-16 , consists of Stage-18, Stage-2
Stage-18 has a backup stage: Stage-2
Stage-14 depends on stages: Stage-18
Stage-3 depends on stages: Stage-2, Stage-14
Stage-9 depends on stages: Stage-3 , consists of Stage-6, Stage-5, Stage-7
Stage-6
Stage-0 depends on stages: Stage-6, Stage-5, Stage-8
Stage-20 depends on stages: Stage-0
Stage-4 depends on stages: Stage-20
Stage-5
Stage-7
Stage-8 depends on stages: Stage-7
Stage-2
Stage-11 is a root stage
Stage-12 depends on stages: Stage-11
Stage-17 depends on stages: Stage-12 , consists of Stage-19, Stage-1
Stage-19 has a backup stage: Stage-1
Stage-16 depends on stages: Stage-19
Stage-1
STAGE PLANS:
Stage: Stage-10
Map Reduce
Map Operator Tree:
TableScan
alias: t3
Statistics: Num rows: 39076287 Data size: 2283385713 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: scenario_id (type: bigint)
outputColumnNames: scenario_id
Statistics: Num rows: 39076287 Data size: 2283385713 Basic stats: COMPLETE Column stats: NONE
Group By Operator
keys: scenario_id (type: bigint)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 39076287 Data size: 2283385713 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: bigint)
sort order: +
Map-reduce partition columns: _col0 (type: bigint)
Statistics: Num rows: 39076287 Data size: 2283385713 Basic stats: COMPLETE Column stats: NONE
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: bigint)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 19538143 Data size: 1141692827 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-15
Conditional Operator
Stage: Stage-18
Map Reduce Local Work
Alias -> Map Local Tables:
reconcile-subquery1:t1-subquery1:$INTNAME1
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
reconcile-subquery1:t1-subquery1:$INTNAME1
TableScan
HashTable Sink Operator
keys:
0 _col0 (type: bigint)
1 _col0 (type: bigint)
Stage: Stage-14
Map Reduce
Map Operator Tree:
TableScan
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col0 (type: bigint)
1 _col0 (type: bigint)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col11
Statistics: Num rows: 715121683 Data size: 39113453068 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: _col11 is null (type: boolean)
Statistics: Num rows: 357560841 Data size: 19556726506 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: bigint), _col3 (type: int), _col4 (type: double), _col5 (type: decimal(25,13)), _col6 (type: decimal(25,13)), _col7 (type: decimal(25,13))
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
Statistics: Num rows: 357560841 Data size: 19556726506 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Local Work:
Map Reduce Local Work
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
Union
Statistics: Num rows: 396637128 Data size: 21840112219 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 396637128 Data size: 21840112219 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: admin.FctPrfitAmt_reporting_k_benchmark
TableScan
alias: FctPrfitAmt_incr
Statistics: Num rows: 39076287 Data size: 2283385713 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: scenario_id (type: bigint), facility_id (type: bigint), process_id (type: bigint), mp_surrogate_id (type: int), units (type: double), raw_amount (type: decimal(25,13)), allocation_percent (type: decimal(25,13)), capacity_allocation_percent (type: decimal(25,13))
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
Statistics: Num rows: 39076287 Data size: 2283385713 Basic stats: COMPLETE Column stats: NONE
Union
Statistics: Num rows: 396637128 Data size: 21840112219 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 396637128 Data size: 21840112219 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: admin.FctPrfitAmt_reporting_k_benchmark
Stage: Stage-9
Conditional Operator
Stage: Stage-6
Move Operator
files:
hdfs directory: true
destination: hdfs://nameservice1/admin/.hive-staging_hive_2017-04-24_04-17-27_639_6500987676644679103-777/-ext-10001
Stage: Stage-0
Move Operator
files:
hdfs directory: true
destination: hdfs://nameservice1/admin/FctPrfitAmt_reporting_k_benchmark
Stage: Stage-20
Create Table Operator:
Create Table
columns: scenario_id bigint, facility_id bigint, process_id bigint, mp_surrogate_id int, units double, raw_amount decimal(25,13), allocation_percent decimal(25,13), capacity_allocation_percent decimal(25,13)
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
serde name: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: admin.FctPrfitAmt_reporting_k_benchmark
Stage: Stage-4
Stats-Aggr Operator
Stage: Stage-5
Map Reduce
Map Operator Tree:
TableScan
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: admin.FctPrfitAmt_reporting_k_benchmark
Stage: Stage-7
Map Reduce
Map Operator Tree:
TableScan
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: admin.FctPrfitAmt_reporting_k_benchmark
Stage: Stage-8
Move Operator
files:
hdfs directory: true
destination: hdfs://nameservice1/admin/.hive-staging_hive_2017-04-24_04-17-27_639_6500987676644679103-777/-ext-10001
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
key expressions: _col0 (type: bigint)
sort order: +
Map-reduce partition columns: _col0 (type: bigint)
Statistics: Num rows: 650110607 Data size: 35557683837 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint), _col2 (type: bigint), _col3 (type: int), _col4 (type: double), _col5 (type: decimal(25,13)), _col6 (type: decimal(25,13)), _col7 (type: decimal(25,13))
TableScan
Reduce Output Operator
key expressions: _col0 (type: bigint)
sort order: +
Map-reduce partition columns: _col0 (type: bigint)
Statistics: Num rows: 19538143 Data size: 1141692827 Basic stats: COMPLETE Column stats: NONE
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col0 (type: bigint)
1 _col0 (type: bigint)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col11
Statistics: Num rows: 715121683 Data size: 39113453068 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: _col11 is null (type: boolean)
Statistics: Num rows: 357560841 Data size: 19556726506 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: bigint), _col3 (type: int), _col4 (type: double), _col5 (type: decimal(25,13)), _col6 (type: decimal(25,13)), _col7 (type: decimal(25,13))
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
Statistics: Num rows: 357560841 Data size: 19556726506 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-11
Map Reduce
Map Operator Tree:
TableScan
alias: t3
filterExpr: scenario_id is null (type: boolean)
Statistics: Num rows: 39076287 Data size: 2283385713 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: scenario_id is null (type: boolean)
Statistics: Num rows: 19538143 Data size: 1141692827 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: null (type: bigint)
outputColumnNames: scenario_id
Statistics: Num rows: 19538143 Data size: 1141692827 Basic stats: COMPLETE Column stats: NONE
Group By Operator
keys: scenario_id (type: bigint)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 19538143 Data size: 1141692827 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: bigint)
sort order: +
Map-reduce partition columns: _col0 (type: bigint)
Statistics: Num rows: 19538143 Data size: 1141692827 Basic stats: COMPLETE Column stats: NONE
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: bigint)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 9769071 Data size: 570846384 Basic stats: COMPLETE Column stats: NONE
Select Operator
Statistics: Num rows: 9769071 Data size: 570846384 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: count()
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-12
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: bigint)
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (_col0 = 0) (type: boolean)
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: 0 (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
Group By Operator
keys: _col0 (type: bigint)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-17
Conditional Operator
Stage: Stage-19
Map Reduce Local Work
Alias -> Map Local Tables:
reconcile-subquery1:t1-subquery1:$INTNAME
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
reconcile-subquery1:t1-subquery1:$INTNAME
TableScan
HashTable Sink Operator
keys:
0
1
Stage: Stage-16
Map Reduce
Map Operator Tree:
TableScan
alias: t2
Statistics: Num rows: 591009630 Data size: 32325166424 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Semi Join 0 to 1
keys:
0
1
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
Statistics: Num rows: 650110607 Data size: 35557683837 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Local Work:
Map Reduce Local Work
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: t2
Statistics: Num rows: 591009630 Data size: 32325166424 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 591009630 Data size: 32325166424 Basic stats: COMPLETE Column stats: NONE
value expressions: scenario_id (type: bigint), facility_id (type: bigint), process_id (type: bigint), mp_surrogate_id (type: int), units (type: double), raw_amount (type: decimal(25,13)), allocation_percent (type: decimal(25,13)), capacity_allocation_percent (type: decimal(25,13))
TableScan
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
Reduce Operator Tree:
Join Operator
condition map:
Left Semi Join 0 to 1
keys:
0
1
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
Statistics: Num rows: 650110607 Data size: 35557683837 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
答案 0 :(得分:0)
它看起来像一个增量操作。
我们使用完全连接,只需生成一个mapreduce作业。
CREATE TABLE admin.FctPrfitAmt_rpt AS
SELECT
case when t3.scenario_id is null then t2.scenario_id else t3.scenario_id as scenario_id ,
case when t3.scenario_id is null then t2.COL1 else t3.COL1 as COL1 ,
case when t3.scenario_id is null then t2.COL2 else t3.COL2 as COL2 ,
case when t3.scenario_id is null then t2.COL3 else t3.COL3 as COL3 ,
........
FROM admin.FctPrfitAmt t2
full join admin.FctPrfitAmt_incr t3 on t2.scenario_id = t3.scenario_id