Question

我有一个简单的MySQL 5.6.23 GROUP BY查询，在RDS db.r3.xlarge instance上运行需要32秒。 InnoDB表有大约47M行。 explain说我选择了大约8K。最后的GROUP BY输出有86行。

根据show processlist;，99％的时间用于Creating sort index。如果我大大增加menu_id in (...)列表中的ID数量，则查询需要10-30分钟。

不幸的是，我无法将文本从数据库服务器复制/粘贴到此终端，因此下面的表格输出是缩写的。

查询信息：

SELECT COUNT(DISTINCT user_id) AS count_user_id, org, category
  FROM menu_views
  WHERE menu_id in (
    ...about 1300 ids...
  ) GROUP BY org, category;

explain;
| id | select_type | table      | type  | possible_keys                                                                           | key                  | key_len | ref  | rows | Extra                                 |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1  | SIMPLE      | menu_views | range | i_menu_view_menu_id,tyler_group,tyler_user_group,tyler_user_menu_group,tyler_menu_group | i_menu_views_menu_id | 5       | NULL | 7914 | Using index condition; Using filesort |

输出：

| count_user_id | org | category |
|--------------------------------|
| 13000         | foo | pizza    |
| 1             | bar | candy    |
| 90            | baz | cheese   |
| 80            | gaz | soda     |
| 150           | urk | pizza    |
|     ... etc (86 rows) ...      |
|--------------------------------|

背景信息：

describe menu_views;

| Field    | Type         | Null | Key | Default |
|------------------------------------------------|
| id       | int(11)      | NO   | PRI | NULL    |
| menu_id  | int(11)      | YES  | MUL | NULL    |
| user_id  | int(11)      | YES  | MUL | NULL    |
| category | varchar(255) | NO   |     | UNKNOWN |
| org      | varchar(255) | NO   | MUL | UNKNOWN |
|------------------------------------------------|

show index from menu_views;

| Key_name               | Seq_in_index | Column_name |
|-----------------------------------------------------|
| PRIMARY                | 1            | id          |
| i_menu_views_menu_id   | 1            | menu_id     |
| tyler_group            | 1            | org         |
| tyler_group            | 2            | category    |
| tyler_user_group       | 1            | user_id     |
| tyler_user_group       | 2            | org         |
| tyler_user_group       | 3            | category    |
| tyler_user_menu_group  | 1            | user_id     |
| tyler_user_menu_group  | 2            | menu_id     |
| tyler_user_menu_group  | 3            | org         |
| tyler_user_menu_group  | 4            | category    |
| tyler_menu_group       | 1            | menu_id     |
| tyler_menu_group       | 2            | org         |
| tyler_menu_group       | 3            | category    |
|-----------------------------------------------------|

表上还有其他索引，但这些索引是通过EXPLAIN显示的索引。我添加了tyler_*个试图强制使用loose index scan，但它没有帮助。

org和category字段正确属于users，但我对它们进行了非规范化，希望非JOIN查询更快。但是，我还没有看到任何性能提升。

完全披露：我正在使用此查询的多个变体，所有变体都很慢。这是最简单的变体。其他人包括WHERE created_at BETWEEN ('X' and 'Y')和GROUP BY year/month/week/day(created_at), category。

Answer 1

在盯着很多其他人的代码和博客文章后，我似乎终于走上了正轨。我意识到由于我使用了COUNT和GROUP BY，我永远不会达到loose index scan。

事实证明，真正缓慢的部分是COUNT(DISTINCT user_id)。我可以使用COUNT(user_id)运行完全相同的查询，并在两秒钟内收到结果。更快，但错误的数据为我的目的。

我当前使用子查询的优化版本是：

SELECT COUNT(user_id) AS count_user_id, org, category FROM (
  SELECT user_id, org, category
  FROM menu_views
  WHERE menu_id IN (
     ... lots of ids ... 
  ) GROUP BY user_id, org, category
) AS groupings
GROUP BY org, category;

我认为我仍然需要使用索引等，但这在原始查询的20％时间内运行。

Answer 2

尝试一下：

INDEX(user_id, org, category) -- covering index for either of your queries.
INDEX(created_at, category)  -- for the additional example

输出意味着它必须触及13K行。使用上面的索引，它可以完成索引中的所有工作，而不必覆盖数据。

（请提供SHOW CREATE TABLE，它比DESCRIBE更具描述性。）

对于47M行，您应该考虑正常化＆＃39; org和category - 我认为这些领域有很多重复？我无法判断我们的查询是否受I / O限制，但这会降低这种可能性。

优化简单的MySQL GROUP BY查询，该查询停留在＆＃34;创建排序索引＆＃34;

2 个答案: