我正在使用完全连接来查看来自两个不同select语句的users-id的重叠和非重叠(唯一值)。主要区别在于一个表将具有deal_id = 0而另一个表将具有大于或等于1的任何deal_id。 我正在加入关于exchange_id,pub_id和user_id的select语句,但不加入deal_id。 这是我的疑问:
SET
hive.auto.convert.join = TRUE
;
SELECT
First.deal_id
,COALESCE( First.exchange_id, Second.exchange_id ) as exchange_id
,COALESCE( First.pub_id, Second.pub_id ) as pub_id
,COUNT (DISTINCT(case when Second.user_id is null then First.user_id else null END)) AS Incremental
,SUM (First.imps) AS First_imps
,SUM (Second.imps) AS Second_imps
FROM
(
SELECT
a.deal_id
,a.exchange_id
,a.pub_id
,a.user_id
,1 AS imps
FROM
logs a
WHERE
a.deal_id >= 1
AND a.event_type = 'TRUE'
) First
FULL JOIN (
SELECT
a.exchange_id
,a.pub_id
,a.user_id
,1 AS imps
FROM
logs a
WHERE
a.deal_id = 0
AND a.event_type = 'TRUE'
) Second
ON (
First.exchange_id = Second.exchange_id
AND First.pub_id = Second.pub_id
AND First.user_id = Second.user_id
)
GROUP BY
COALESCE( First.exchange_id, Second.exchange_id )
,COALESCE( First.pub_id, Second.pub_id )
;
以下是我看到的结果:
DEAL_ID EXCHANGE_ID PUB_ID INCREMENTAL FIRST_IMPS SECOND_IMPS
/N 4 1780 0 0 15
/N 4 1560 0 0 32
3389 4 1780 2 7 6
1534 4 1560 4 9 8
以下是我希望看到的内容:
DEAL_ID EXCHANGE_ID PUB_ID INCREMENTAL FIRST_IMPS SECOND_IMPS
3389 4 1780 2 7 21
1534 4 1560 4 9 40
其中具有空交易ID的结果与基于exchange_id和pub_id的非空交易ID的结果匹配。
我该怎么办?
编辑: 为了澄清 - 我输入的查询是原始查询的简化,需要两个单独的select语句,因为我正在与另一个事件表进行联合。我没有在这里显示,因为它与Full Join问题上的聚合无关。 此外,增量值正在尝试计算出现在deal_id> = 1且不存在于deal_id = 0中的用户(完全加入的另一个原因)。
答案 0 :(得分:0)
您的查询似乎过于复杂。您可以对查询使用条件聚合:
select min(case when l.deal_id >= 1 then l.deal_id end) as deal_id,
l.exchange_id, l.pub_id,
count(distinct case when l.deal_id >= 1 then l.user_id end) as incremental,
sum(case when l.deal_id >= 1 then 1 else 0 end) as imps_1,
sum(case when l.deal_id = 0 then 1 else 0 end) as imps_0
from logs l
where l.event_type = 'TRUE'
group by l.exchange_id, l.pub_id;
我唯一不确定的是deal_id
。但这似乎是你想要的逻辑。