我使用和不使用SMB加入运行查询并得到不同的结果。请帮忙解释一下。
SET hive.enforce.bucketing=true;
create table dbaproceduresbuckets (
owner string ,
object_name string ,
procedure_name string ,
object_id double ,
subprogram_id double ,
overload string ,
object_type string ,
aggregate string ,
pipelined string ,
impltypeowner string ,
impltypename string ,
parallel string ,
interface string ,
deterministic string ,
authid string )
CLUSTERED BY (object_id) SORTED BY (OBJECT_ID ASC) INTO 32 BUCKETS;
CREATE TABLE dbaobjectsbuckets1(
owner string,
object_name string,
subobject_name string,
object_id double,
data_object_id double,
object_type string,
created string,
last_ddl_time string,
timestamp string,
status string,
temporary string,
generated string,
secondary string,
namespace double,
edition_name string) CLUSTERED BY (object_id) SORTED BY (OBJECT_ID ASC) INTO 32 BUCKETS;
**** load the table;
0:jdbc:hive2:// xxxxxx:10000>从dbaobjectsbuckets1中选择count(*),dbaproceduresbuckets b 0:jdbc:hive2:// xxxxxxxx:10000>其中a.object_id = b.object_id; INFO:Stage-2的Hadoop作业信息:映射器数量:3;减速机数量:1 INFO:2016-06-13 15:56:00,381 Stage-2 map = 0%,reduce = 0% INFO:2016-06-13 15:56:55,818 Stage-2 map = 1%,reduce = 0%,累计CPU 122.6秒 INFO:2016-06-13 15:57:47,124 Stage-2 map = 7%,reduce = 0%,累计CPU 326.86秒 .......... INFO:2016-06-13 16:05:01,246 Stage-2 map = 100%,reduce = 100%,累计CPU 867.1秒 信息:MapReduce累计CPU总时间:14分27秒100毫秒 信息:结束工作= job_1464280256859_0146 + -------- + - + | _c0 | + -------- + - + | 54876 | + -------- + - +
****
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.auto.convert.sortmerge.join.noconditionaltask=true;
set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;
0: jdbc:hive2://xxxxxxx:10000> select count(*) from dbaobjectsbuckets1 a, dbaproceduresbuckets b
0:jdbc:hive2:// xxxxxxxx:10000>其中a.object_id = b.object_id;
in the execution plan, I am seeing
|排序合并桶地图加入运算符| |条件图:| |内部加入0到1 | |键:| | 0 object_id(type:double)| | 1 object_id(type:double)
**** but the result is showing
INFO : Hadoop job information for Stage-1: number of mappers: 32; number of reducers: 1
......
INFO : MapReduce Total cumulative CPU time: 4 minutes 8 seconds 490 msec
信息:结束工作= job_1464280256859_0150 + ------ + - + | _c0 | + ------ + - + | 2 | + ------ + - +
?????我的问题是为什么当我使用SMB加入时它只有2???????它应该是54876。
谢谢!
答案 0 :(得分:1)
在将数据插入到排序表中时使用sort by子句
或
set hive.enforce.sorting=true
在将数据插入到排序表之前