Hive内部连接错误的结果

时间:2017-10-20 04:38:16

标签: hadoop hive hiveql

两个表table1和表2

hive> select * from  table1 where  dt=20171020;
OK
a   1   1   p   10  20171020
b   2   2   q   10  20171020
c   3   3   r   10  20171020
d   4   4   r   10  20171020

hive> select * from  table2 where  dt=20171020;
OK
a   1   1   p   10  20171020
b   2   2   t   10  20171020
c   3   3   r   10  20171020

hive> select * from table1 t1
    > join table2 t2
    > on t1.c1=t2.c1
    > where
    > t1.dt=20171020 and t2.dt=20171020 and  
    > t1.c2 <> t2.c2 or t1.c3 <> t2.c3 or t1.c4 <> t2.c4  or t1.c5 <> t2.c5;

Result:
a   1   1   p   20  20171016    a   1   1   p   10  20171015
a   1   1   p   20  20171016    a   1   1   p   10  20171020
b   2   2   q   20  20171016    b   2   2   t   10  20171015
b   2   2   q   20  20171016    b   2   2   t   10  20171020
c   3   3   r   20  20171016    c   3   3   r   10  20171015
c   3   3   r   20  20171016    c   3   3   r   10  20171020
b   2   2   q   10  20171020    b   2   2   t   10  20171015
b   2   2   q   10  20171020    b   2   2   t   10  20171020
a   19  19  p   20  20171019    a   1   1   p   10  20171015
a   19  19  p   20  20171019    a   1   1   p   10  20171020

我想要关注行,因为此行已更改,hive如何加入上述代码?

 b  2   2   q   10  20171020

1 个答案:

答案 0 :(得分:0)

试试这个。你的加入也应该约会。

SELECT *
FROM   table1 t1
       JOIN table2 t2
         ON t1.c1 = t2.c1
            AND t1.dt = t2.dt
WHERE  t1.dt = 20171020
       AND ( t1.c2 <> t2.c2
              OR t1.c3 <> t2.c3
              OR t1.c4 <> t2.c4
              OR t1.c5 <> t2.c5 );