我正在尝试优化快速优化用PHP编写的一些过时的论坛软件的搜索功能。我已将我的工作归结为如下所示的查询:
SELECT thread.threadid
FROM thread AS thread
INNER JOIN word AS word ON (word.title LIKE 'word1' OR word.title LIKE 'word2')
INNER JOIN postindex AS postindex ON (postindex.wordid = word.wordid)
INNER JOIN post AS postquery ON (postquery.postid = postindex.postid)
WHERE thread.threadid = postquery.threadid
GROUP BY thread.threadid
HAVING COUNT(DISTINCT word.wordid) = 2
LIMIT 25;
word1
和word2
就是例子;可以有任意数量的单词。查询最后的数字是单词总数。我们的想法是,一个主题包含搜索查询中的所有单词,分布在任意数量的帖子中。
此查询通常超过60秒,只有两个单词,并且超时。我很难过;我无法弄清楚如何进一步优化这个可怕的搜索引擎。
据我所知,所有内容都已正确索引,最近我运行ANALYZE
。大多数数据库都在InnoDB上运行。这是EXPLAIN
:
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
| 1 | SIMPLE | word | range | PRIMARY,title | title | 150 | NULL | 2 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | postindex | ref | wordid,temp_ix | temp_ix | 4 | database1.word.wordid | 3 | Using index condition |
| 1 | SIMPLE | postquery | eq_ref | PRIMARY,threadid,showthread | PRIMARY | 4 | database1.postindex.postid | 1 | NULL |
| 1 | SIMPLE | thread | eq_ref | PRIMARY,forumid,postuserid,pollid,title,lastpost,dateline,prefixid,tweeted,firstpostid | PRIMARY | 4 | database1.postquery.threadid | 1 | Using index |
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
LIMIT 25
似乎没什么帮助。它可能会从通常返回数百个结果的查询中消弱。
减缓MySQL的部分是GROUP BY ... HAVING ...
位。对于GROUP BY
,LIMIT
对于提高性能几乎没用。没有GROUP BY
,只要LIMIT
仍然存在,查询就会非常快。
SHOW CREATE TABLE postindex;
的输出:
CREATE TABLE `postindex` (
`wordid` int(10) unsigned NOT NULL DEFAULT '0',
`postid` int(10) unsigned NOT NULL DEFAULT '0',
`intitle` smallint(5) unsigned NOT NULL DEFAULT '0',
`score` smallint(5) unsigned NOT NULL DEFAULT '0',
UNIQUE KEY `wordid` (`wordid`,`postid`),
KEY `temp_ix` (`wordid`),
KEY `postid` (`postid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
我没有制作表格,所以我不知道为什么wordid上有重复的索引;但是,我不愿意删除它,因为这是一个古老而又变幻无常的软件。
答案 0 :(得分:1)
您可以尝试多次重写并比较执行计划和时间。
使用2个EXISTS
子查询(每个要检查的单词一个):
SELECT t.threadid
FROM thread AS t
WHERE EXISTS
( SELECT 1
FROM post AS p
JOIN postindex AS pi
ON pi.postid = p.postid
JOIN word AS w
ON pi.wordid = w.wordid
WHERE w.title = 'word1'
AND t.threadid = p.threadid
)
AND EXISTS
( SELECT 1
FROM post AS p
JOIN postindex AS pi
ON pi.postid = p.postid
JOIN word AS w
ON pi.wordid = w.wordid
WHERE w.title = 'word2'
AND t.threadid = p.threadid
) ;
使用一个EXISTS
子查询:
SELECT t.threadid
FROM thread AS t
WHERE EXISTS
( SELECT 1
FROM post AS p1
JOIN postindex AS pi1
ON pi1.postid = p1.postid
JOIN word AS w1
ON w1.wordid = pi1.wordid
AND w1.title = 'word1'
JOIN post AS p2
ON p2.threadid = p1.threadid
JOIN postindex AS pi2
ON pi2.postid = p2.postid
JOIN word AS w2
ON w2.wordid = pi2.wordid
AND w2.title = 'word2'
WHERE t.threadid = p1.threadid
AND t.threadid = p2.threadid
) ;
包含多个联接的单个查询和GROUP BY
只能删除重复的threadid
:
SELECT t.threadid
FROM thread AS t
JOIN post AS p1
ON p1.threadid = t.threadid
JOIN postindex AS pi1
ON pi1.postid = p1.postid
JOIN word AS w1
ON w1.wordid = pi1.wordid
AND w1.title = 'word1'
JOIN post AS p2
ON p1.threadid = t.threadid
JOIN postindex AS pi2
ON pi2.postid = p2.postid
JOIN word AS w2
ON w2.wordid = pi2.wordid
AND w2.title = 'word2'
WHERE p1.threadid = p2.threadid -- this line is redundant
GROUP BY t.threadid ;
答案 1 :(得分:0)
我首先创建临时表,并存储与您的搜索匹配的distinct(thread.threadid,word.wordid)。然后选择thread.threadid,其中count()=搜索的单词数。