背景:
我运行了一个平台,允许用户关注创作者并查看其内容。
以下查询成功显示了50个按受欢迎程度排序的帖子。还有其他一些逻辑不显示用户已经保存/删除的帖子,但这与该问题无关。
问题:
如果一位创作者特别受欢迎(popularity
高),则返回的前50名帖子将几乎所有由该创作者担任。
这会使结果产生偏差,因为理想情况下,返回的50个帖子不会偏向于某个特定作者。
问题:
如何限制它,以使作者(使用字段posted_by
)返回的次数不超过 5次。可能要少一些,但绝对不能超过5次。
它仍然应该由popularity
DESC最终订购
SELECT *
FROM `source_posts`
WHERE `posted_by` IN (SELECT `username`
FROM `source_accounts`
WHERE `id` IN (SELECT `sourceid`
FROM `user_source_accounts`
WHERE `profileid` = '100'))
AND `id` NOT IN (SELECT `postid`
FROM `user_posts_removed`
WHERE `profileid` = '100')
AND `live` = '1'
AND `added` >= Date_sub(Now(), INTERVAL 1 month)
AND `popularity` > 1
ORDER BY `popularity` DESC
LIMIT 50
谢谢。
修改:
我正在使用MySQL版本5.7.24,因此不幸的是row_number()函数在此实例中不起作用。
答案 0 :(得分:1)
在MySQL 8+中,您只需使用row_number()
:
select sp.*
from (select sp.*,
row_number() over (partition by posted_by order by popularity desc) as seqnum
from source_posts sp
) sp
where seqnum <= 5
order by popularity desc
limit 50;
我不确定您查询的其余部分在做什么,因为您的问题中没有描述。当然,您可以添加其他过滤条件或join
。
编辑:
在早期版本中,您可以使用变量:
select sp.*
from (select sp.*,
(@rn := if(@p = posted_by, @rn + 1,
if(@p := posted_by, 1, 1)
)
) as rn
from (select sp.*
from source_posts sp
order by posted_by, popularity desc
) sp cross join
(select @p := '', @rn := 0) params
) sp
where rn <= 5
order by popularity desc
limit 50;
答案 1 :(得分:1)
可以尝试行号功能。使用它,它将为每个员工分配一个不同的“ id”。因此,如果一个员工有50条记录,则仅返回row_number(名为“ rank”)小于或等于5的记录。
Select *
from(
SELECT `source_posts.*`, row_number() over (partition by `username` order by `popularity` desc) as rank
FROM `source_posts`
WHERE `posted_by` IN (SELECT `username`
FROM `source_accounts`
WHERE `id` IN (SELECT `sourceid`
FROM `user_source_accounts`
WHERE `profileid` = '100'))
AND `id` NOT IN (SELECT `postid`
FROM `user_posts_removed`
WHERE `profileid` = '100')
AND `live` = '1'
AND `added` >= Date_sub(Now(), INTERVAL 1 month)
AND `popularity` > 1
ORDER BY `popularity` DESC
LIMIT 50 `enter code here`)
where rank <= 5