我正在阅读Nitin Borwankar的伟大tagging article,他开始考虑使用两个表格实现不同级别搜索的方法。
tags {
id,
tag
}
post_tags {
id
user_id
post_id
tag_id
}
我从T(U(i))
的简单示例开始,这意味着所有拥有项目的用户的所有标记。我能够使用以下SQL来完成它:
/* get all tags from the users found */
SELECT t.*, vt.* FROM verse_tags as vt
LEFT JOIN tags as t ON t.id = vt.tag_id
WHERE user_id in
(
/* Get all user_ids that have taged this item */
SELECT user_id FROM verse_tags WHERE verse_id = 26046 GROUP BY user_id
)
GROUP BY t.id
然后我开始使用稍微强硬的+1级深度查询。 T(U(T(u)))
是使用用户#等标签的用户标记。
/* Then get the tags of the user with tags like the user 3 */
SELECT t.id FROM post_tags as pt
LEFT JOIN tags as t ON t.id = pt.tag_id
WHERE user_id in
(
/* Then get users with these tags */
SELECT pt.user_id FROM post_tags as pt
LEFT JOIN tags as t on t.id = pt.tag_id
WHERE tag_id in
(
/* get tags of user */
SELECT t.id FROM post_tags as pt
LEFT JOIN tags as t ON t.id = pt.tag_id
WHERE pt.user_id = 3
GROUP BY t.id
)
GROUP BY user_id
)
GROUP BY t.id
但是,因为我通常在查询中使用JOIN,所以我不确定如何优化这样的东西,或者在使用子查询时需要避免哪些设计缺陷。我甚至已经读过应该使用JOIN,但我不知道如何通过上述查询来实现。
如何优化这些查询?
1)用GROUP BY
替换SELECT DISTINCT
。 (.74秒)
2)将WHERE in
替换为WHERE exists
。 (.40秒)
3)添加了索引(哎呀!)(0.09秒)
4)回到WHERE in
(0.08秒)
EXPLAIN SELECT DISTINCT tag_id FROM post_tags WHERE user_id in
(
SELECT DISTINCT user_id FROM post_tags WHERE tag_id in
(
SELECT DISTINCT tag_id FROM post_tags WHERE user_id = 3
)
)
运行EXPLAIN会给我这些结果:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY post_tags index NULL tag_id 4 NULL 14 Using where
2 DEPENDENT SUBQUERY post_tags index_subquery user_id user_id 4 func 1 Using where
3 DEPENDENT SUBQUERY post_tags index_subquery user_id,tag_id tag_id 4 func 1 Using where
答案 0 :(得分:1)
据我所知这是解决方案:
SELECT DISTINCT(`t`.`id`) FROM `post_tags` as `pt`
left join `tags` as t on `t`.`id` = `pt`.`tag_id`
where `pt`.`user_id` in(
SELECT distinct(`pt`.`user_id`) FROM `post_tags` as `pt`
LEFT JOIN `tags` as `t` on `t`.`id` = `pt`.`tag_id`
WHERE `pt`.`tag_id` in(
SELECT distinct(`tag_id`) FROM `post_tags`
WHERE pt.user_id = 3
)
)