Impala / Hive查询多个联接条件

时间:2019-11-07 06:52:29

标签: mysql sql apache-spark hive impala

我需要对来自tableA_index的结果进行分组,然后将其与tableB合并以获得以下结果。

  • tableA_index和tableB中都存在的tb2_c1的计数

  • tb2_c1的计数,仅在tableA_index中存在

  • tb2_c1的计数,仅在表B中存在

最终结果应该是这样,

c3 | c1 | common_c1s | tableA_only_c1s | tableB_only_c1s 
   |    |            |                 |    
   |    |            |                 |

我在Impala中尝试了以下解决方案,但由于某些原因,此方法不起作用。

   select 
      ures.c3 c3, ures.c1 c1, count(t1.tb2_c1) common_c1s, count(t2.tb2_c1) tableA_only_c1s, count(t3.tb2_c1) tableB_only_c1s 
      from (
        select c1, c2, c3 from tableA_0 
        UNION
        select c1, c2, c3 from tableA_1
        UNION
        select c1, c2, c3 from tableA_2 
        UNION
        select c1, c2, c3 from tableA_3
        UNION
        select c1, c2, c3 from tableA_4
        UNION
        select c1, c2, c3 from tableA_5 
        ) ures 
        INNER JOIN 
        ( select tb2_c1, tb2_c2 from tableB ) t1
        ON t1.tb2_c1 = ures.c2
        AND t1.tb2_c2 = ures.c3 
        LEFT SEMI JOIN
        ( select tb2_c1, tb2_c2 from tableB ) t2
        ON t2.tb2_c1 = ures.c2
        AND t2.tb2_c2 = ures.c3 
        LEFT ANTI JOIN
        ( select tb2_c1, tb2_c2 from tableB ) t3
        ON t3.tb2_c1 = ures.c2
        AND t3.tb2_c2 = ures.c3 
      GROUP BY c3, c1

0 个答案:

没有答案