我需要对来自tableA_index的结果进行分组,然后将其与tableB合并以获得以下结果。
tableA_index和tableB中都存在的tb2_c1的计数
tb2_c1的计数,仅在tableA_index中存在
tb2_c1的计数,仅在表B中存在
最终结果应该是这样,
c3 | c1 | common_c1s | tableA_only_c1s | tableB_only_c1s
| | | |
| | | |
我在Impala中尝试了以下解决方案,但由于某些原因,此方法不起作用。
select
ures.c3 c3, ures.c1 c1, count(t1.tb2_c1) common_c1s, count(t2.tb2_c1) tableA_only_c1s, count(t3.tb2_c1) tableB_only_c1s
from (
select c1, c2, c3 from tableA_0
UNION
select c1, c2, c3 from tableA_1
UNION
select c1, c2, c3 from tableA_2
UNION
select c1, c2, c3 from tableA_3
UNION
select c1, c2, c3 from tableA_4
UNION
select c1, c2, c3 from tableA_5
) ures
INNER JOIN
( select tb2_c1, tb2_c2 from tableB ) t1
ON t1.tb2_c1 = ures.c2
AND t1.tb2_c2 = ures.c3
LEFT SEMI JOIN
( select tb2_c1, tb2_c2 from tableB ) t2
ON t2.tb2_c1 = ures.c2
AND t2.tb2_c2 = ures.c3
LEFT ANTI JOIN
( select tb2_c1, tb2_c2 from tableB ) t3
ON t3.tb2_c1 = ures.c2
AND t3.tb2_c2 = ures.c3
GROUP BY c3, c1