有一份文件清单。多个用户可以标记文档。现在对于给定的标签(更多的是,当超过30%的用户选择了该标签时,需要文档列表。
mapping: --------------------------- user_id | document_id | tag 1 | 34 | 26 2 | 34 | 26 3 | 36 | 25 4 | 34 | 27
辅助表也有给定文档的总tag_count。
counters: --------------------------- document_id | tag_count 34 | 12 36 | 26
我可以为单个标签编写查询,例如
select * from mapping m join (select document_id,count(*) as req_tag_count from mapping group by document_id) as s on s.document_id = m.document_id join counters c on c.document_id = m.document_id and req_tag_count / c.tag_count > .3 where m.tag = 26
但是无法为多个标签编写查询,例如返回标签A和B都符合上述条件30%的文件。
答案 0 :(得分:1)
也许这就是你所需要的:
SELECT t.document_id
FROM (SELECT m.document_id
FROM mapping m
WHERE m.tag = 26 # Specify the first tag
GROUP BY m.document_id
HAVING COUNT(m.document_id) /
(SELECT count(document_id)
FROM mapping i
WHERE i.document_id = m.document_id
GROUP BY i.document_id) > 0.3
UNION SELECT n.document_id
FROM mapping n
WHERE n.tag = 27 # Specify the second tag
GROUP BY n.document_id
HAVING COUNT(n.document_id) /
(SELECT count(document_id)
FROM mapping i
WHERE i.document_id = n.document_id
GROUP BY i.document_id) > 0.3)
AS t
GROUP BY t.document_id
HAVING COUNT(t.document_id) = 2 # One per tag
我测试它时起作用了。您也可以为3个标签进行调整。