假设我们有一个名为 actions(date, uid, pid, action, description)
的表。该表的示例如下所示:
Table: actions
date uid pid action description
'2018-10-19' 1234 12 'view'
'2018-10-19' 1234 12 'report' 'SPAM'
'2018-10-19' 5678 23 'reaction' 'LOVE'
有一个表也称为 reviewers(date, rid, pid)
。评论者是删除帖子的人。评论者不是用户。该表的示例如下所示:
Table: reviewers
date rid pid
'2018-10-19' 567 12
'2018-10-19' 890 45
用户观看(采取任何操作)的日常内容中有多少实际上是垃圾邮件?
会做以下工作:
案例1:“看着”指的是任何动作
select u.date, count(distinct r.pid)/count(distinct uu.pid))*100
from actions u join actions uu
on u.pid = uu.pid
inner join reviewers r
on u.pid = r.pid
where u.description = 'SPAM'
group by 1
案例2:“看着”表示操作=“查看”
select u.date, count(distinct r.pid)/count(distinct uu.pid))*100
from actions u join actions uu
on u.pid = uu.pid
inner join reviewers r
on u.pid = r.pid
where u.description = 'SPAM'
and uu.action = 'VIEW'
group by 1
答案 0 :(得分:0)
您不需要两次join
。如果我理解正确:
select u.date,
avg(case when u.description = 'SPAM' then 1.0 else 0 end)
from actions u left join
reviewers r
on u.pid = r.pid
group by u.date;
嗯。 。 。您需要先汇总才能加入。所以这可能更好:
select u.date,
avg(case when u.description = 'SPAM' then 1.0 else 0 end)
from (select date, uid, pid,
max(case when u.description = 'SPAM' then 1 else 0 end) as is_spam
from actions u
group by date, uid, pid
) u left join
reviewers r
on u.pid = r.pid
group by u.date;
答案 1 :(得分:0)
我不确定为什么需要考虑reviewers
,或者在该表中是否可以重复使用pid,但是我认为这可以满足您的需要(样本中的50.0%)
select
count(distinct (case when description = 'SPAM' and r.pid IS NOT NULL then pid end)) * 100.0
/
count(distinct pid)
from actions a
left join (
select distinct pid from reviewers
) r on r.pid = a.pid
;