我正在努力找到解决这个问题的最有效方法,但我必须告诉你,我已经把它搞得一团糟。环顾四周,没有发现任何相关性,所以就这样了。
如何选择与所需项目具有相似标签的所有项目?
以此表为例:
(用于重新创建表格的sql代码)
project 1 -> tagA | tagB | tagC
project 2 -> tagA | tagB
project 3 -> tagA
project 4 -> tagC
选择项目1应该返回所有项目 选择项目4应仅返回项目项目1
到目前为止,我的查询非常依赖于左连接,并且肯定有更好的方法可以执行此操作:
SELECT all_tags.project_id, all_tags.tag_id, final.title, tag.tag
FROM projects AS p
LEFT JOIN projects_to_tags AS pt ON p.num = pt.project_id
LEFT JOIN projects_to_tags AS all_tags ON pt.tag_id = all_tags.tag_id
LEFT JOIN projects AS final ON all_tags.project_id = final.num
LEFT JOIN tags AS tag ON all_tags.tag_id = tag.tag_id
WHERE p.num = 4
GROUP BY final.num
谢谢大家的意见。我虽然与大家分享一下100k项目数据库的所有查询的平均结果,100k标签数据库与100k projects_to_tags关系。所有查询都已更改为要求project_1。
甜蜜与短暂:
0.0160 sec - OMG Ponies - Using JOINS
0.0208 sec - jdelard
0.2581 sec - OMG Ponies - Using EXISTS
0.2777 sec - OMG Ponies - Using IN
0.5295 sec - Emtucifor - updated query
0.5088 sec - Emtucifor - first query
非常感谢你们。要相应地更新我的所有查询。
此处查看所有查询及相应的MySQL EXPLAIN以及时间
===============================================================================================================================================
Emtucifor - updated query
===============================================================================================================================================
Showing rows 0 - 1 (2 total, Query took 0.5295 sec)
SELECT *
FROM projects AS L
WHERE L.num !=1-- instead of <> PT2.project_id inside
AND EXISTS (
SELECT 1
FROM projects_to_tags PT
INNER JOIN projects_to_tags PT2 ON PT.tag_id = PT2.tag_id
WHERE L.num = PT.project_id
AND PT2.project_id =1
)
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY L ALL PRIMARY NULL NULL NULL 100000 Using where
2 DEPENDENT SUBQUERY PT2 ref project_id project_id 4 const 1 Using index
2 DEPENDENT SUBQUERY PT ref project_id project_id 8 test.L.num,test.PT2.tag_id 12000 Using index
===============================================================================================================================================
Emtucifor - first query
===============================================================================================================================================
Showing rows 0 - 1 (2 total, Query took 0.5088 sec)
SELECT *
FROM projects AS L
WHERE
EXISTS (
SELECT 1
FROM projects_to_tags PT
INNER JOIN projects_to_tags PT2 ON PT.tag_id = PT2.tag_id
WHERE L.num = PT.project_id
AND PT2.project_id =1
AND PT2.project_id <> L.num
)
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY L ALL NULL NULL NULL NULL 100000 Using where
2 DEPENDENT SUBQUERY PT2 ref project_id project_id 4 const 1 Using index
2 DEPENDENT SUBQUERY PT ref project_id project_id 8 test.L.num,test.PT2.tag_id 12000 Using where; Using index
===============================================================================================================================================
jdelard
===============================================================================================================================================
Showing rows 0 - 1 (2 total, Query took 0.0208 sec)
SELECT p.num, p.title
FROM projects_to_tags pt1, projects_to_tags pt2, projects p
WHERE pt1.project_id =1
AND pt2.project_id !=1
AND pt1.tag_id = pt2.tag_id
AND p.num = pt2.project_id
GROUP BY pt2.project_id
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE pt1 ref project_id project_id 4 const 1 Using index; Using temporary; Using filesort
1 SIMPLE pt2 index project_id project_id 8 NULL 75001 Using where; Using index
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 test.pt2.project_id 1
===============================================================================================================================================
OMG Ponies - Using IN
===============================================================================================================================================
Showing rows 0 - 2 (3 total, Query took 0.2777 sec)
SELECT p . *
FROM projects p
JOIN projects_to_tags pt ON pt.project_id = p.num
WHERE pt.tag_id
IN (
SELECT x.tag_id
FROM projects_to_tags x
WHERE x.project_id =1
)
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY pt index project_id project_id 8 NULL 100001 Using where; Using index
1 PRIMARY p eq_ref PRIMARY PRIMARY 4 test.pt.project_id 1
2 DEPENDENT SUBQUERY x ref project_id project_id 8 const,func 12000 Using where; Using index
===============================================================================================================================================
OMG Ponies - Using EXISTS
===============================================================================================================================================
Showing rows 0 - 2 (3 total, Query took 0.2581 sec)
SELECT p . *
FROM projects p
JOIN projects_to_tags pt ON pt.project_id = p.num
WHERE EXISTS (
SELECT NULL
FROM projects_to_tags x
WHERE x.project_id = 1
AND x.tag_id = pt.tag_id
)
LIMIT 0 , 30
===============================================================================================================================================
OMG Ponies - Using JOINS
===============================================================================================================================================
Showing rows 0 - 2 (3 total, Query took 0.0160 sec)
SELECT DISTINCT p . *
FROM projects p
JOIN projects_to_tags pt ON pt.project_id = p.num
JOIN projects_to_tags x ON x.tag_id = pt.tag_id
AND x.project_id = 1
LIMIT 0 , 30
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE x ref project_id project_id 4 const 1 Using index; Using temporary
1 SIMPLE pt index project_id project_id 8 NULL 75001 Using where; Using index
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 test.pt.project_id 1
复制/粘贴和混乱的SQL代码。
CREATE TABLE IF NOT EXISTS `projects` (
`num` int(2) NOT NULL auto_increment,
`title` varchar(30) NOT NULL,
PRIMARY KEY (`num`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ;
INSERT INTO `projects` (`num`, `title`) VALUES(1, 'project 1'),(2, 'project 2'),(3, 'project 3'),(4, 'project 4');
CREATE TABLE IF NOT EXISTS `projects_to_tags` (
`project_id` int(2) NOT NULL,
`tag_id` int(2) NOT NULL,
KEY `project_id` (`project_id`,`tag_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `projects_to_tags` (`project_id`, `tag_id`) VALUES(1, 1),(1, 2),(1, 3),(2, 1),(2, 2),(3, 1),(4, 3);
CREATE TABLE IF NOT EXISTS `tags` (
`tag_id` int(2) NOT NULL auto_increment,
`tag` varchar(30) NOT NULL,
PRIMARY KEY (`tag_id`),
UNIQUE KEY `tag` (`tag`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;
INSERT INTO `tags` (`tag_id`, `tag`) VALUES(1, 'tag a'),(2, 'tag b'),(3, 'tag c');
答案 0 :(得分:6)
在以下任何一种情况下,如果您不知道PROJECT.num
/ PROJECT_TO_TAGS.project_id
,则必须加入PROJECTS
表以获取用于查找的ID值输出与之相关的标签。
SELECT p.*
FROM PROJECTS p
JOIN PROJECTS_TO_TAGS pt ON pt.project_id = p.num
WHERE pt.tag_id IN (SELECT x.tag_id
FROM PROJECTS_TO_TAGS x
WHERE x.project_id = 4)
SELECT p.*
FROM PROJECTS p
JOIN PROJECTS_TO_TAGS pt ON pt.project_id = p.num
WHERE EXISTS (SELECT NULL
FROM PROJECTS_TO_TAGS x
WHERE x.project_id = 4
AND x.tag_id = pt.tag_id)
DISTINCT
是必要的,因为JOIN会冒重复数据出现在结果集中......
SELECT DISTINCT p.*
FROM PROJECTS p
JOIN PROJECTS_TO_TAGS pt ON pt.project_id = p.num
JOIN PROJECTS_TO_TAGS x ON x.tag_id = pt.tag_id
AND x.project_id = 4
答案 1 :(得分:4)
如何......(项目1的例子)
SELECT p.num, p.title
FROM projects_to_tags pt1, projects_to_tags pt2, projects p
where pt1.project_id = 1 and
pt2.project_id != 1 and
pt1.tag_id = pt2.tag_id and
p.num = pt2.project_id
group by pt2.project_id
并且可能在projects_to_tags中为tag_id添加单独的索引,因此您可以单独使用它,而不是复合。没有更多类型的ALL。 (表扫描) 将 1 替换为 4 也可以获得所需的结果。
答案 2 :(得分:2)
这样的东西......?
SELECT *
FROM projects AS L
WHERE
EXISTS (
SELECT 1
FROM
projects_to_tags PT
INNER JOIN projects_to_tags PT2 ON PT.tag_id = PT2.tag_id
WHERE
L.num = PT.project_id
AND PT2.project_id = 4
AND PT2.project_id <> L.num
)
那是2寻找和扫描。
<强>更新强>
从jdelard的书中读取一个页面,一个微小的修改将我的查询切换为优于他(当然我在SQL Server上执行此操作意味着我取出他的GROUP BY并放入DISTINCT,因此在MySQL上使用YMMV):< / p>
SELECT *
FROM projects AS L
WHERE
L.num != 4 -- instead of <> PT2.project_id inside
AND EXISTS (
SELECT 1
FROM
projects_to_tags PT
INNER JOIN projects_to_tags PT2 ON PT.tag_id = PT2.tag_id
WHERE
L.num = PT.project_id
AND PT2.project_id = 4
)
对查询的改进来自于不进行DISTINCT或聚合,并使用半连接而不是完整连接,因此不是每行都必须连接。否则,在语义上它们大致相同。
我必须记住jdelard的诀窍,因为它是一个非常有用的工具。由于某种原因,查询引擎不够智能,无法计算{a = 4,a!= b}然后{b!= 4}。