我想验证我的SMB加入是否正常。我能够通过日志验证地图加入,但不能通过SMB验证。我也通过了解释计划,但无法获得任何提示。请帮帮我。
答案 0 :(得分:4)
您可以在查询中使用EXPLAIN EXTENDED。 到目前为止,我只能用map-reduce生成SMB地图连接。 当hive正在进行SMB映射连接时,您可以在explain的输出中看到阶段计划下的“Sorted Merge Bucket Map Join Operator”。
这是一个代码片段,它在我的设置中使用map-reduce生成SMB地图连接:
set hive.execution.engine=mr;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
set hive.auto.convert.join=true;
drop table key_value_large;
drop table key_value_small;
create table key_value_large (
key int,
value string
)
partitioned by (ds string)
CLUSTERED BY (key) SORTED BY (key ASC) INTO 8 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;
create table key_value_small (
key int,
value string
)
partitioned by (ds string)
CLUSTERED BY (key) SORTED BY (key ASC) INTO 4 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;
insert into table key_value_large partition(ds='2008-04-08') select key, value from key_value_large_src;
insert into table key_value_small partition(ds='2008-04-08') select key, value from key_value_small_src;
explain extended select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key;
select count(*) from key_value_large a JOIN key_value_small b ON a.key = b.key;
希望这可能有助于某人。
答案 1 :(得分:2)
以下是SMBM加入的提示
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.auto.convert.sortmerge.join.noconditionaltask=true;
使用上述提示后。如果您必须有符合SMBM加入条件的表格(两个表格应该用相同的列和相同数量的桶进行分段,您必须使用分段列加入表格)
EXPLAIN显示连接查询的下面的o / p
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: a
Statistics: Num rows: 13 Data size: 1289 Basic stats: COMPLETE Column stats: NONE
Sorted Merge Bucket Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {realm} {role} {lid} {mid} {sid} {insert_date}
1 {realm} {role} {lid} {mid} {sid} {insert_date}
keys:
0 mid (type: string)
1 mid (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col8, _col9, _col10, _col11, _col12, _col13
Select Operator
expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: date), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: date)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11
File Output Operator
compressed: true
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Time taken: 0.131 seconds, Fetched: 35 row(s)
正如您所看到的,o / p明确表示它将执行SMBM加入。