尝试从“连接”表中计算每日接受率,该表有4个带样本值的字段:
date action sender_id recipient_id
'2017-01-05', 'request_link', 'frank', 'joe'
'2017-01-06', 'request_link', 'sally', 'ann'
'2017-01-07', 'request_link', 'bill', 'ted'
'2017-01-07', 'accept_link', 'joe', 'frank'
'2017-01-06', 'accept_link', 'ann', 'sally'
'2017-01-06', 'accept_link', 'ted', 'bill'
因为01-05上有0个接受和1个请求,所以它的每日接受率应该是0/1 = 0.同样,01-06的比率应该是2/1,它应该是1/1 01-07。
然而,重要的是每个accept_link都有一个相应的request_link,其中request_link的sender_id = accept_link的recipient_id(反之亦然)。所以这里需要自我加入我相信确保Joe接受Frank的请求,无论日期如何。
如何更正以下查询,以便在保留所需的连接条件的同时正确地进行聚合?如果删除了两个WHERE条件,或者它们是否必要,查询是否会正确计算?
SELECT f1.date,
SUM(CASE WHEN f2.action = 'accept_link' THEN 1 ELSE 0 END) /
SUM(CASE WHEN f2.action = 'request_link' THEN 1 ELSE 0 END) AS acceptance_ratio
FROM connecting f1
LEFT JOIN connecting f2
ON f1.sender_id = f2.recipient_id
LEFT JOIN connecting f2
ON f1.recipient_id = f2.sender_id
WHERE f1.action = 'request_link'
AND f2.action = 'accept_link'
GROUP BY f1.date
ORDER BY f1.date ASC
预期输出应该类似于:
date acceptance_ratio
'2017-01-05' 0.0000
'2017-01-06' 2.0000
'2017-01-07' 1.0000
提前致谢。
答案 0 :(得分:1)
再一次,我认为你不需要在这里使用自我加入。相反,只需对整个表使用条件聚合,并计算每天发生的请求数和接受次数:
SELECT t.date,
CASE WHEN t.num_requests = 0
THEN 'No requests available'
ELSE CAST(t.num_accepts / t.num_requests AS CHAR(50))
END AS acceptance_ratio
FROM
(
SELECT c1.date,
SUM(CASE WHEN c1.action = 'accept_link' AND c2.action IS NOT NULL
THEN 1 ELSE 0 END) AS num_accepts,
SUM(CASE WHEN c1.action = 'request_link' THEN 1 ELSE 0 END) AS num_requests
FROM connecting c1
LEFT JOIN connecting c2
ON c1.action = 'accept_link' AND
c2.action = 'request_link' AND
c1.sender_id = c2.recipient_id AND
c2.recipient_id = c1.sender_id
GROUP BY c1.date
) t
ORDER BY t.date
请注意,我使用CASE
表达式来处理除以零,这可能会在某一天没有请求时发生。我还假设同一个邀请不会被多次发送。