MySQL基于关联标签查询顶级文档

时间:2015-06-11 17:02:26

标签: mysql sql

有一份文件清单。多个用户可以标记文档。现在对于给定的标签(更多的是,当超过30%的用户选择了该标签时,需要文档列表。

mapping:
---------------------------  
user_id | document_id | tag
    1   |     34      | 26
    2   |     34      | 26
    3   |     36      | 25 
    4   |     34      | 27  

辅助表也有给定文档的总tag_count。

counters:
---------------------------  
document_id | tag_count
    34      | 12
    36      | 26

我可以为单个标签编写查询,例如

select * from mapping m
join (select document_id,count(*) as req_tag_count
from mapping
group by document_id) as s on s.document_id = m.document_id
join counters c on c.document_id = m.document_id and req_tag_count / c.tag_count > .3
where m.tag = 26

但是无法为多个标签编写查询,例如返回标签A和B都符合上述条件30%的文件。

1 个答案:

答案 0 :(得分:1)

也许这就是你所需要的:

SELECT t.document_id
FROM (SELECT m.document_id
    FROM mapping m
    WHERE m.tag = 26         # Specify the first tag
    GROUP BY m.document_id
    HAVING COUNT(m.document_id) /
    (SELECT count(document_id)
        FROM mapping i
        WHERE i.document_id = m.document_id
        GROUP BY i.document_id) > 0.3
    UNION SELECT n.document_id
    FROM mapping n
    WHERE n.tag = 27         # Specify the second tag
    GROUP BY n.document_id
    HAVING COUNT(n.document_id) /
    (SELECT count(document_id)
        FROM mapping i
        WHERE i.document_id = n.document_id
        GROUP BY i.document_id) > 0.3)
AS t
GROUP BY t.document_id
HAVING COUNT(t.document_id) = 2    # One per tag

我测试它时起作用了。您也可以为3个标签进行调整。