HiveQL连接查询 - NVL不在where子句中工作

时间:2017-05-08 16:39:50

标签: null hive hiveql nvl

我有一个HiveQL查询,如下所示:

create table JOINED as select TABLEA.* from TABLEA join TABLEB on
TABLEA.key=TABLEB.key where nvl(TABLEA.attr, 0)=nvl(TABLEB.attr, 0);

但是此查询不会选择TABLEA.key=TABLEB.key

这些行
  1. TABLEA.attr=NULLTABLEB.attr=NULL。 (OR)
  2. TABLEA.attr=0TABLEB.attr=NULL。 (OR)
  3. TABLEA.attr=NULLTABLEB.attr=0
  4. 以上案例均未被挑选。为什么会这样?我误解了NVL()的使用吗?

    如果attr属性为NULL,我希望attr属性默认为0。什么是正确的查询?

1 个答案:

答案 0 :(得分:0)

谢谢,我刚刚报告了一个错误 -
Incorrect results for INNER JOIN ON clause / WHERE involving NVL / COALESCE

如果您要检查执行计划,您会发现两个表格的错误谓词attr is not null。 从两个表中选择列(例如select TABLEA.*,TABLEB.key)似乎可以防止出现此问题。

explain
select TABLEA.* from TABLEA join TABLEB on
TABLEA.key=TABLEB.key where nvl(TABLEA.attr, 0)=nvl(TABLEB.attr, 0);
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        $hdt$_0:tablea 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        $hdt$_0:tablea 
          TableScan
            alias: tablea
            Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (key is not null and attr is not null) (type: boolean)
              Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: key (type: int), attr (type: int)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
                HashTable Sink Operator
                  keys:
                    0 _col0 (type: int), NVL(_col1,0) (type: int)
                    1 _col0 (type: int), NVL(_col1,0) (type: int)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: tableb
            Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (key is not null and attr is not null) (type: boolean)
              Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: key (type: int), attr (type: int)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE
                Map Join Operator
                  condition map:
                       Inner Join 0 to 1
                  keys:
                    0 _col0 (type: int), NVL(_col1,0) (type: int)
                    1 _col0 (type: int), NVL(_col1,0) (type: int)
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 1 Data size: 15 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 15 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink