Question

我有一张包含200万条记录的表格。

这是表格

comments
---------
    +-------------+---------------+------+-----+---------+----------------+
    | Field       | Type          | Null | Key | Default | Extra          |
    +-------------+---------------+------+-----+---------+----------------+
    | commentid   | int(11)       | NO   | PRI | NULL    | auto_increment |
    | parentid    | int(11)       | YES  |     | 0       |                |
    | refno       | int(11)       | YES  |     | 0       |                |
    | createdate  | int(11)       | YES  | MUL | 0       |                |
    | remoteip    | varchar(80)   | YES  |     |         |                |
    | fingerprint | varchar(50)   | YES  |     |         |                |
    | locid       | int(11)       | YES  | MUL | 0       |                |
    | clubid      | int(11)       | YES  |     | 0       |                |
    | profileid   | int(11)       | YES  | MUL | 0       |                |
    | userid      | int(11)       | YES  | MUL | 0       |                |
    | global      | int(11)       | YES  |     | 0       |                |
    | official    | int(11)       | YES  |     | 0       |                |
    | legacyuser  | int(11)       | YES  | MUL | 0       |                |
    | mediaid     | int(11)       | YES  |     | 0       |                |
    | status      | int(11)       | YES  |     | 1       |                |
    | comment     | varchar(4000) | YES  |     |         |                |
    | likes       | int(11)       | YES  |     | 0       |                |
    | dislikes    | int(11)       | YES  |     | 0       |                |
    | import      | int(11)       | YES  |     | 0       |                |
    | author      | varchar(50)   | YES  |     |         |                |
    +-------------+---------------+------+-----+---------+----------------+

现在针对200万条记录的查询需要6到7秒：

select * from comments where (locid=2085 or global=1) and status>0 order by createdate desc limit 20;

我决定为locid添加一个索引，它仍会在6到7秒内产生结果

我本来可以使用一个sqlfiddle但是它本来是不必要的，因为这个问题的基础与性能有关，而且我没有向sqlfiddle添加2mil记录。

是否有任何策略或实施可以将此查询带入3秒范围？

谢谢！

更新

这是我的解释展示表。

        | comments | CREATE TABLE `comments` (
      `commentid` int(11) NOT NULL AUTO_INCREMENT,
      `parentid` int(11) DEFAULT '0',
      `refno` int(11) DEFAULT '0',
      `createdate` int(11) DEFAULT '0',
      `remoteip` varchar(80) DEFAULT '',
      `fingerprint` varchar(50) DEFAULT '',
      `locid` int(11) DEFAULT '0',
      `clubid` int(11) DEFAULT '0',
      `profileid` int(11) DEFAULT '0',
      `userid` int(11) DEFAULT '0',
      `global` int(11) DEFAULT '0',
      `official` int(11) DEFAULT '0',
      `legacyuser` int(11) DEFAULT '0',
      `mediaid` int(11) DEFAULT '0',
      `status` int(11) DEFAULT '1',
      `comment` varchar(4000) DEFAULT '',
      `likes` int(11) DEFAULT '0',
      `dislikes` int(11) DEFAULT '0',
      `import` int(11) DEFAULT '0',
      `author` varchar(50) DEFAULT '',
      PRIMARY KEY (`commentid`),
      KEY `comments_locid` (`locid`),
      KEY `comments_userid` (`userid`),
      KEY `idx_legacyusers` (`legacyuser`),
      KEY `profile_index` (`profileid`),
      KEY `comments_createdate` (`createdate`),
      KEY `compound_for_comments` (`locid`,`global`,`status`),
      KEY `global` (`global`),
      KEY `status` (`status`)
    ) ENGINE=InnoDB AUTO_INCREMENT=3848451 DEFAULT CHARSET=latin1

Answer 1

大多数数据库，特别是MySQL，在使用or时非常糟糕。

您可以通过将查询拆分为or来消除union，其中每一半都处理or的一侧，如下所示：

select * from (
    select * from comments
    where locid = 2085
    and status > 0
    union
    select * from comments
    where global = 1
    and status > 0) x
order by createdate desc
limit 20

Answer 2

我相信＆＃39;按顺序排列＆＃39;导致这需要花费很多时间。删除订单并查看是否已更改。您可以通过主键进行排序，因为后面的记录具有更大的分配主键，这是一个关键且更快。其他选择是使用存储在内存而不是硬盘上的引擎。

Answer 3

试试这个：

select distinct * from (

    select * from (
        select * from comments where locid=2085 and status>0 order by commentid desc limit 20
    ) t1

     union all

      select * from (
         select * from comments where global=1 and status>0 order by commentid desc limit 20
     ) t2

 ) t
order by commentid desc 
limit 20

索引为（locid，status）和（global，status）。（status，global）可能比（global，status）更好 - 它取决于哪个列更具选择性。

仅当createdate的排序等于commentid时才有效。在某种程度上，你需要索引，如（locid，status，createdate）和createdate订单。

Answer 4

这可能比目前为止在两个答案中提到的查询更快：

component.dist

这两个＆＃34;涵盖＆＃34;索引：

SELECT  c.*
    FROM  ( 
              ( SELECT  commentid, createdate
                    FROM  comments
                    WHERE  locid=2085
                      AND  status > 0
                    ORDER BY  createdate DESC
                    LIMIT  20 
              )
            UNION  DISTINCT 
              ( SELECT  commentid, createdate
                    FROM  comments
                    WHERE  global=1
                      AND  status > 0
                    ORDER BY  createdate DESC
                    LIMIT  20 
              )
            ORDER BY  createdate DESC
            LIMIT  20 
          ) x
    JOIN  comments c USING (commentid);

（基于以后的信息）由于（global = 1）通常为true且（status＆gt; 0）通常为false，因此以下可能更好。（有一个问题是INDEX(locid, status, createdate, commentid) INDEX(global, status, createdate, commentid)是否添加了一把猴子扳手。）

DESC

INDEX(locid, createdate, status, commentid) INDEX(global, createdate, status, commentid)仍存在风险。如果它通常是＆＃39; global，那么上述索引可能不是最佳的。

这个公式会更快，因为子查询将完全在索引中（＆＃34;覆盖＆＃34;），而不是拖拽所有列（1）。这确实需要额外的*，但在SELECT只有20行的效率JOIN。如果你的桌子变得太大而无法缓存，这将是一个巨大的性能奖励。

我明确指出PRIMARY KEY，假设你会得到重复。如果没有，则UNION DISTINCT会更快。

架构批评：

使用适当大小的UNION ALL - INTs为4个字节; INT只有1;等
在适当的地方使用TINYINT（特别是ids和count）。
在适当情况下使用UNSIGNED。
不要自己索引标记（NOT NULL？global？）;该指数不太可能被使用。
status有多少个不同的值？如果status可以替换status>0，我建议的索引会更好。

缩小数据可以加快此查询（和其他人）。

查询仍然进展缓慢

4 个答案: