Question

对于我的网上商店，我有一张桌子，我用于搜索：

CREATE TABLE `store_search` (
  `term` varchar(50) NOT NULL DEFAULT '',
  `content_id` int(10) unsigned NOT NULL,
  `type` enum('keyword','tag') NOT NULL DEFAULT 'keyword',
  `random` int(10) unsigned NOT NULL,
  `saving` int(10) unsigned NOT NULL,
  PRIMARY KEY (`content_id`,`term`,`type`),
  UNIQUE KEY `saving` (`term`,`saving`,`random`,`content_id`,`type`),
  UNIQUE KEY `random` (`term`,`random`,`content_id`,`type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED

产品可以通过两种方式列出：按随机顺序（基于列random）或折扣（基于列saving）。过去的测试表明，使用UNIQUE约束命令比使用标准索引和ORDER BY更有效。查询可能如下所示：

mysql> EXPLAIN SELECT content_id FROM store_search USE INDEX (random) WHERE term LIKE 'shirt%' AND type='keyword' LIMIT 2000,100;
+----+-------------+--------------+-------+---------------+--------+---------+------+---------+--------------------------+
| id | select_type | table        | type  | possible_keys | key    | key_len | ref  | rows    | Extra                    |
+----+-------------+--------------+-------+---------------+--------+---------+------+---------+--------------------------+
|  1 | SIMPLE      | store_search | range | random        | random | 152     | NULL | 9870580 | Using where; Using index |
+----+-------------+--------------+-------+---------------+--------+---------+------+---------+--------------------------+

所以我可以阻止ORDER BY子句（没有使用这种方法完成文件输出）。搜索多个术语时，PRIMARY KEY用于自我加入：

mysql> EXPLAIN SELECT DISTINCT x.content_id
    -> FROM store_search x USE INDEX (saving)
    -> INNER JOIN store_search y ON x.content_id=y.content_id
    -> WHERE x.term LIKE 'shirt%' AND x.type='keyword' AND y.term LIKE 'blue%' AND y.type='keyword'
    -> LIMIT 0,100;
+----+-------------+-------+-------+-----------------------+---------+---------+--------------+----------+-------------------------------------------+
| id | select_type | table | type  | possible_keys         | key     | key_len | ref          | rows     | Extra                                     |
+----+-------------+-------+-------+-----------------------+---------+---------+--------------+----------+-------------------------------------------+
|  1 | SIMPLE      | x     | range | PRIMARY,saving,random | saving  | 152     | NULL         | 11449970 | Using where; Using index; Using temporary |
|  1 | SIMPLE      | y     | ref   | PRIMARY,saving,random | PRIMARY | 4       | x.content_id |       20 | Using where; Using index; Distinct        |
+----+-------------+-------+-------+-----------------------+---------+---------+--------------+----------+-------------------------------------------+

正如我所说，到目前为止这个解决方案很好。我现在的问题是：这个表目前是如此庞大（~500mio行），索引不再适合内存。这导致INSERT和UPDATE语句非常慢。数据占用23GB，索引消耗32GB，因此该表占用55GB。可以进行测试，但是在复制此表时会花费大量时间，但是有没有人可以减少索引大小？我想将字符串列的排序规则转换为latin_1，但是我可以整合一些索引吗？

Answer 1

term LIKE 'shirt%'是范围查找。对于INDEX(term, ...)或其他列，term进行过滤不会超过type。

我和Index Cookbook中讨论了这个和其他基本索引原则。

所以... WHERE term LIKE 'shirt%' AND type='keyword'请求INDEX(keyword, term)。在过滤中，添加任何其他列都不会有帮助。

然而......你所依赖的是涵盖。这是所有所需列都在单个索引中的位置。在这种情况下，可以在索引BTree中执行查询而不触摸数据BTree。也就是说，添加额外的列可以是有益的。

中有多项内容

SELECT  content_id
    FROM  store_search USE INDEX (random)
    WHERE  term LIKE 'shirt%'
      AND  type='keyword'
    LIMIT  2000,100; 
UNIQUE KEY `random` (`term`,`random`,`content_id`,`type`)

以下是一些：

索引是＆＃34;涵盖＆＃34;。
没有ORDER BY，因此输出可能首先由term排序（假设可能有多个短语以＆＃39;衬衫＆＃39;开头），并且仅次于random。这不是你想要的，但可能会发生作用。
LIMIT要求扫描2000 + 100行索引，然后退出。如果没有足够的衬衫，它将停止。这可能看起来很快＆＃34;。
UNIQUE可能无关紧要，浪费插入。

下一个查询让我们剖析SELECT DISTINCT x.content_id ...。

您已更换＆＃34; filesort＆＃34;与DISTINCT的代码相似（可能更快）。可能没有净收益;时间吧。
如果有999件蓝色衬衫，它会找到所有999件，然后DISTINCTify，然后交付100件。
如果没有ORDER BY，您就无法预测将交付哪100个。
由于您已经收集了所有999，因此添加ORDER BY RAND()不会增加额外费用。
你真的想要＆＃39;蓝绿色＆＃39;衬衫要退还，但不是浅蓝色＆＃39;？那么＆＃39; dress％＆＃39;捡起裤子＆＃39;？淫。

底线

仅使用PRIMARY KEY(type, term, content_id)替换3个索引。通过PK进入，你可以有效地覆盖＆＃34;。
使用ORDER BY random或ORDER BY RAND() - 查看哪种方式更适合您。（后者更随机！）
重新考虑LIKE 'shirt%'

底线是EAV架构设计很糟糕。我讨论了这个further。

MySQL减少了巨大的表的索引大小

1 个答案: