发现解决这个问题具有挑战性
表1有用户名(约2百万)
表2列出了所有带有电子邮件地址的用户ID(约1.5亿)
表3包含电子邮件地址(约有100个订阅某个电子邮件程序的用户)
需要计算表1中已订阅且未订阅电子邮件程序的用户。
我试过
select b.email_address
from
table_1 a
left outer join table_2 b
on a.user_id = b.user_id
intersect
select email from table_3
但这是一种不正确的方式。我需要准确计算电子邮件程序的订阅者数量。任何帮助表示赞赏。
答案 0 :(得分:1)
我猜这里的挑战是第二个表(大表)每个电子邮件地址有多个用户。由于电子邮件地址太少,我倾向于将其视为:
with emailusers as (
select distinct userid
from table2 t2 join
table3 t3
on t2.email = t3.email
)
select sum(case when eu.userid is not null then 1 else 0 end) as subscribed,
sum(case when eu.userid is null then 1 else 0 end) as notsubscribed
from table1 t1 left join
emailusers eu
on t1.userid = eu.userid;
答案 1 :(得分:0)
您想要一个连接和一个描述是否找到记录的CASE WHEN。
SELECT
SUM(CASE WHEN c.email is NULL THEN 1 ELSE 0 END) AS not_subscribed,
SUM(CASE WHEN c.email IS NOT NULL THEN 1 ELSE 0 END) AS subscribed
FROM table_1 a
LEFT JOIN table_2 AS b
ON a.user_id = b.user_id
LEFT JOIN table_3 AS c
ON b.email = c.email
现在,这会对您的数据做出一些假设,尤其是表格3中没有重复的电子邮件地址。您应该能够验证not_subscribed +订阅等于SELECT COUNT(DISTINCT userids)FROM table_1。如果没有,您应该一次执行一个连接,并确定丢失/添加记录的位置。