Question

我正在尝试优化快速优化用PHP编写的一些过时的论坛软件的搜索功能。我已将我的工作归结为如下所示的查询：

SELECT thread.threadid
FROM thread AS thread
INNER JOIN word AS word ON (word.title LIKE 'word1' OR word.title LIKE 'word2')
INNER JOIN postindex AS postindex ON (postindex.wordid = word.wordid)
INNER JOIN post AS postquery ON (postquery.postid = postindex.postid)
WHERE thread.threadid = postquery.threadid
GROUP BY thread.threadid
HAVING COUNT(DISTINCT word.wordid) = 2
LIMIT 25;

word1和word2就是例子;可以有任意数量的单词。查询最后的数字是单词总数。我们的想法是，一个主题包含搜索查询中的所有单词，分布在任意数量的帖子中。

此查询通常超过60秒，只有两个单词，并且超时。我很难过;我无法弄清楚如何进一步优化这个可怕的搜索引擎。

据我所知，所有内容都已正确索引，最近我运行ANALYZE。大多数数据库都在InnoDB上运行。这是EXPLAIN：

的输出

+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
| id | select_type | table     | type   | possible_keys                                                                          | key     | key_len | ref                          | rows | Extra                                                     |
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | word      | range  | PRIMARY,title                                                                          | title   | 150     | NULL                         |    2 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | postindex | ref    | wordid,temp_ix                                                                         | temp_ix | 4       | database1.word.wordid        |    3 | Using index condition                                     |
|  1 | SIMPLE      | postquery | eq_ref | PRIMARY,threadid,showthread                                                            | PRIMARY | 4       | database1.postindex.postid   |    1 | NULL                                                      |
|  1 | SIMPLE      | thread    | eq_ref | PRIMARY,forumid,postuserid,pollid,title,lastpost,dateline,prefixid,tweeted,firstpostid | PRIMARY | 4       | database1.postquery.threadid |    1 | Using index                                               |
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+

更新

LIMIT 25似乎没什么帮助。它可能会从通常返回数百个结果的查询中消弱。

澄清

减缓MySQL的部分是GROUP BY ... HAVING ...位。对于GROUP BY，LIMIT对于提高性能几乎没用。没有GROUP BY，只要LIMIT仍然存在，查询就会非常快。

SQL Info

SHOW CREATE TABLE postindex;的输出：

CREATE TABLE `postindex` (
  `wordid` int(10) unsigned NOT NULL DEFAULT '0',
  `postid` int(10) unsigned NOT NULL DEFAULT '0',
  `intitle` smallint(5) unsigned NOT NULL DEFAULT '0',
  `score` smallint(5) unsigned NOT NULL DEFAULT '0',
  UNIQUE KEY `wordid` (`wordid`,`postid`),
  KEY `temp_ix` (`wordid`),
  KEY `postid` (`postid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

我没有制作表格，所以我不知道为什么wordid上有重复的索引;但是，我不愿意删除它，因为这是一个古老而又变幻无常的软件。

Answer 1

您可以尝试多次重写并比较执行计划和时间。

使用2个EXISTS子查询（每个要检查的单词一个）：

SELECT t.threadid
FROM thread AS t
WHERE EXISTS
      ( SELECT 1
        FROM post AS p
          JOIN postindex AS pi
            ON pi.postid = p.postid
          JOIN word AS w
            ON pi.wordid = w.wordid
        WHERE w.title = 'word1'
          AND t.threadid = p.threadid
      )
  AND EXISTS
      ( SELECT 1
        FROM post AS p
          JOIN postindex AS pi
            ON pi.postid = p.postid
          JOIN word AS w
            ON pi.wordid = w.wordid
        WHERE w.title = 'word2'
          AND t.threadid = p.threadid
      ) ;

使用一个EXISTS子查询：

SELECT t.threadid
FROM thread AS t
WHERE EXISTS
      ( SELECT 1
        FROM post AS p1
          JOIN postindex AS pi1
            ON  pi1.postid = p1.postid
          JOIN word AS w1
            ON  w1.wordid = pi1.wordid
            AND w1.title = 'word1'

          JOIN post AS p2
            ON  p2.threadid = p1.threadid
          JOIN postindex AS pi2
            ON  pi2.postid = p2.postid
          JOIN word AS w2
            ON  w2.wordid = pi2.wordid
            AND w2.title = 'word2'

        WHERE t.threadid = p1.threadid
          AND t.threadid = p2.threadid
      ) ;

包含多个联接的单个查询和GROUP BY只能删除重复的threadid：

SELECT t.threadid
FROM thread AS t

  JOIN post AS p1
    ON  p1.threadid = t.threadid
  JOIN postindex AS pi1
    ON  pi1.postid = p1.postid
  JOIN word AS w1
    ON  w1.wordid = pi1.wordid
    AND w1.title = 'word1'

  JOIN post AS p2
    ON  p1.threadid = t.threadid
  JOIN postindex AS pi2
    ON  pi2.postid = p2.postid
  JOIN word AS w2
    ON  w2.wordid = pi2.wordid
    AND w2.title = 'word2'

WHERE p1.threadid = p2.threadid        -- this line is redundant
GROUP BY t.threadid ;

Answer 2

我首先创建临时表，并存储与您的搜索匹配的distinct（thread.threadid，word.wordid）。然后选择thread.threadid，其中count（）=搜索的单词数。

使用GROUP BY优化MySQL查询时遇到问题... HAVING

更新

澄清

SQL Info

2 个答案: