我有一个HiveQL查询,如下所示:
create table JOINED as select TABLEA.* from TABLEA join TABLEB on
TABLEA.key=TABLEB.key where nvl(TABLEA.attr, 0)=nvl(TABLEB.attr, 0);
但是此查询不会选择TABLEA.key=TABLEB.key
和
TABLEA.attr=NULL
和TABLEB.attr=NULL
。 (OR)TABLEA.attr=0
和TABLEB.attr=NULL
。 (OR)TABLEA.attr=NULL
和TABLEB.attr=0
。以上案例均未被挑选。为什么会这样?我误解了NVL()的使用吗?
如果attr属性为NULL,我希望attr属性默认为0。什么是正确的查询?
答案 0 :(得分:0)
谢谢,我刚刚报告了一个错误 -
Incorrect results for INNER JOIN ON clause / WHERE involving NVL / COALESCE
如果您要检查执行计划,您会发现两个表格的错误谓词attr is not null
。
从两个表中选择列(例如select TABLEA.*,TABLEB.key
)似乎可以防止出现此问题。
explain
select TABLEA.* from TABLEA join TABLEB on
TABLEA.key=TABLEB.key where nvl(TABLEA.attr, 0)=nvl(TABLEB.attr, 0);
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
$hdt$_0:tablea
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
$hdt$_0:tablea
TableScan
alias: tablea
Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (key is not null and attr is not null) (type: boolean)
Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int), attr (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 _col0 (type: int), NVL(_col1,0) (type: int)
1 _col0 (type: int), NVL(_col1,0) (type: int)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: tableb
Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (key is not null and attr is not null) (type: boolean)
Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: key (type: int), attr (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int), NVL(_col1,0) (type: int)
1 _col0 (type: int), NVL(_col1,0) (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 15 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 15 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink