使用LEFT JOIN和SELF JOIN以及聚合函数计算acceptance_ratio

时间:2017-02-09 07:17:50

标签: mysql left-join aggregate-functions self-join

尝试从“连接”表中计算每日接受率,该表有4个带样本值的字段:

date          action         sender_id        recipient_id 
'2017-01-05', 'request_link', 'frank', 'joe' 
'2017-01-06', 'request_link', 'sally', 'ann' 
'2017-01-07', 'request_link', 'bill', 'ted' 
'2017-01-07', 'accept_link', 'joe', 'frank' 
'2017-01-06', 'accept_link', 'ann', 'sally' 
'2017-01-06', 'accept_link', 'ted', 'bill' 

因为01-05上有0个接受和1个请求,所以它的每日接受率应该是0/1 = 0.同样,01-06的比率应该是2/1,它应该是1/1 01-07。

然而,重要的是每个accept_link都有一个相应的request_link,其中request_link的sender_id = accept_link的recipient_id(反之亦然)。所以这里需要自我加入我相信确保Joe接受Frank的请求,无论日期如何。

如何更正以下查询,以便在保留所需的连接条件的同时正确地进行聚合?如果删除了两个WHERE条件,或者它们是否必要,查询是否会正确计算?

SELECT f1.date, 
    SUM(CASE WHEN f2.action = 'accept_link' THEN 1 ELSE 0 END) /
    SUM(CASE WHEN f2.action = 'request_link' THEN 1 ELSE 0 END) AS acceptance_ratio
FROM connecting f1
LEFT JOIN connecting f2
ON f1.sender_id = f2.recipient_id
LEFT JOIN connecting f2
ON f1.recipient_id = f2.sender_id
WHERE f1.action = 'request_link'
AND f2.action = 'accept_link'
GROUP BY f1.date
ORDER BY f1.date ASC

预期输出应该类似于:

date          acceptance_ratio
'2017-01-05'  0.0000
'2017-01-06'  2.0000
'2017-01-07'  1.0000

提前致谢。

1 个答案:

答案 0 :(得分:1)

再一次,我认为你不需要在这里使用自我加入。相反,只需对整个表使用条件聚合,并计算每天发生的请求数和接受次数:

SELECT t.date,
       CASE WHEN t.num_requests = 0
            THEN 'No requests available'
            ELSE CAST(t.num_accepts / t.num_requests AS CHAR(50))
       END AS acceptance_ratio
FROM
(
    SELECT c1.date,
           SUM(CASE WHEN c1.action = 'accept_link' AND c2.action IS NOT NULL
                    THEN 1 ELSE 0 END) AS num_accepts,
           SUM(CASE WHEN c1.action = 'request_link' THEN 1 ELSE 0 END) AS num_requests
    FROM connecting c1
    LEFT JOIN connecting c2
        ON c1.action       = 'accept_link'   AND
           c2.action       = 'request_link'  AND
           c1.sender_id    = c2.recipient_id AND
           c2.recipient_id = c1.sender_id
    GROUP BY c1.date
) t
ORDER BY t.date

请注意,我使用CASE表达式来处理除以零,这可能会在某一天没有请求时发生。我还假设同一个邀请不会被多次发送。