Question

假设有两个表：

    table1.c1   table1.c2
1   1           A
2   1           B
3   1           C
4   2           A
5   2           B

和

    table2.c1   table2.c2
1   2           A
2   2           D
3   3           A
4   3           B

当我这样做时：

select distinct t1.c1, t2.c2
from
schema.table1 t1
join
schema.table2 t2
on (t1.c2 = t2.c2 
    and t1.c1 = t2.c1
    and t1.c1 = 2)

在Hive，我得到：

    t1.c1   t2.c2
1   2   A

这是预期的结果，没问题。但是，当我这样做时：

select distinct t1.c1, t2.c2
from
schema.table1 t1
left join
schema.table2 t2
on (t1.c2 = t2.c2 
    and t1.c1 = t2.c1
    and t1.c1 = 2)

我明白了：

    t1.c1   t2.c2
1   1       NULL
2   2       NULL
3   2       A

因此，ON子句中的过滤器似乎不像我预期的那样工作。如果在LEFT JOIN中找不到第二个表上的密钥，t1.c1 = t2.c1 t1.c1 = 2 t2.c2，则NULL和mainService.fnGerUserDetails().then(function(response) { this.oUserDetails = response;// here 'this' is callback function console.log(this.oUserDetails); //response }已被应用}}

我认为答案必须在doc（可能是在＆＃39;联接发生在何处以及＆＃39;部分？）但是我仍然不理解其中的差异。

该过程如何给出不同的结果？

Answer 1

这就是LEFT (OUTER) JOIN的工作方式：

您在ON - 子句中指定了一些匹配条件。如果在“右”表中找到匹配的行，则将其连接到“左”表中的行。如果没有匹配的行，它仍将返回“left”行以及“right”表中的所有字段设置为null。因此，它永远不会根据ON条件过滤“左”表中的任何行。使用Hive-documentation的术语：左表是“保留的行表”，而右表是“空表”。

这与INNER JOIN相反，for item in data: for i,subitem in enumerate(item): item[i] = [item[i][0]] + [dct for dct in item[i][1:] if dct['color'] != item[i][0]['color']]仅返回在另一个表中具有匹配伙伴的行。因此，没有“保留表”，也不需要“空供应表”

Answer 2

LEFT JOIN应该与FULL JOIN不同。

LEFT join的输出将包含左表中的所有数据（在两个中首先写入），如果右表中没有相应的数据，则将显示NULL值。如果你从查询中删除不同的并运行它，输出应该清除你对LEFT / RIGHT如何加入工作的困惑。

完整加入输出

t1.c1   t1.c2   t2.c2
2       a       a
2       a       d
2       b       a
2       b       d

左连接输出

t1.c1   t1.c2   t2.c2
 1      a       null
 1      b       null
 1      c       null
 2      a       a
 2      a       d
 2      b       a
 2      b       d

Answer 3

Hive在“内部联接”与“左联接”中对待联接条件的方式显然不同。在“内部联接”中，可以将过滤器条件放入ON子句中，但是在“左联接”中，需要将主表（在这种情况下为t1）的过滤器条件放入单独的WHERE子句中。如果您尝试

`select distinct t1.c1, t2.c2
from
schema.table1 t1
left join
schema.table2 t2
on (t1.c2 = t2.c2 
    and t1.c1 = t2.c1)
where t1.c1 = 2;`

您应该得到预期的结果。

Hive：LEFT JOIN vs JOIN使用ON子句中的过滤器给出不同的结果

3 个答案: