Question

我有一张900k +记录的表

运行此查询需要一分钟或更长时间：

SELECT
  t.user_id,
  SUM(t.direction = "i") AS 'num_in',
  SUM(t.direction = "o") AS 'num_out'
FROM tbl_user_reports t
WHERE t.bound_time BETWEEN '2011-02-01' AND '2011-02-28'
GROUP BY t.user_id
HAVING t.user_id IS NOT NULL
ORDER BY num_in DESC
LIMIT 10;

你能告诉我如何更快地查询结果吗？

- 更多信息 - 结构：

id int(11) unsigned NOT NULL
subscriber varchar(255) NULL
user_id int(11) unsigned NULL
carrier_id int(11) unsigned NOT NULL
pool_id int(11) unsigned NOT NULL
service_id int(11) unsigned NOT NULL
persona_id int(11) unsigned NULL
inbound_id int(11) unsigned NULL
outbound_id int(11) unsigned NULL
bound_time datetime NOT NULL
direction varchar(1) NOT NULL

索引：

bound_timebound_time
FK_tbl_user_reportspersona_id
FK_tbl_user_reports_messageinbound_id
FK_tbl_user_reports_serviceservice_id
FK_tbl_user_reports_poolpool_id
FK_tbl_user_reports_useruser_id
FK_tbl_user_reports_carriercarrier_id
FK_tbl_user_reports_subscribersubscriber
FK_tbl_user_reports_outboundoutbound_id
directiondirection

Answer 1

您可能想在

上尝试复合索引

(bound_time, user_id, direction)

包含您需要的所有字段，并且可以非常有效地缩小日期范围。

Answer 2

如果可能，请重新设计您的报告表，以便更好地利用您的innodb群集主键索引。

以下是我的意思的简化示例：

500万行 32K用户日期范围内的126K记录

冷运行时（在mysqld重启后）= 0.13秒

create table user_reports
(
bound_time datetime not null,
user_id int unsigned not null,
id int unsigned not null,
direction tinyint unsigned not null default 0,
primary key (bound_time, user_id, id) -- clustered composite PK
)
engine=innodb;


select count(*) as counter from user_reports;

+---------+
| counter |
+---------+
| 5000000 |
+---------+

select count(distinct(user_id)) as counter from user_reports;

+---------+
| counter |
+---------+
|   32000 |
+---------+

select count(*) as counter from user_reports
 where bound_time between '2011-02-01 00:00:00' and '2011-04-30 00:00:00';

+---------+
| counter |
+---------+
|  126721 |
+---------+

select
 t.user_id,
 sum(t.direction = 1) AS num_in,
 sum(t.direction = 0) AS num_out
from
 user_reports t
where
 t.bound_time between '2011-02-01 00:00:00' and '2011-04-30 00:00:00' and 
 t.user_id is not null
group by
 t.user_id
order by
 direction desc
limit 10;

+---------+--------+---------+
| user_id | num_in | num_out |
+---------+--------+---------+
|   17397 |      1 |       1 |
|   14729 |      2 |       1 |
|   20094 |      4 |       1 |
|   19343 |      7 |       1 |
|   24804 |      1 |       2 |
|   14714 |      3 |       2 |
|    2662 |      4 |       3 |
|   16360 |      2 |       3 |
|   21288 |      2 |       3 |
|   12800 |      6 |       2 |
+---------+--------+---------+
10 rows in set (0.13 sec)

explain
select
 t.user_id,
 sum(t.direction = 1) AS num_in,
 sum(t.direction = 0) AS num_out
from
 user_reports t
where
 t.bound_time between '2011-02-01 00:00:00' and '2011-04-30 00:00:00' and 
 t.user_id is not null
group by
 t.user_id
order by
 direction desc
limit 10;

+----+-------------+-------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |rows   | Extra                                        |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | t     | range | PRIMARY       | PRIMARY | 8       | NULL |255270 | Using where; Using temporary; Using filesort |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
1 row in set (0.00 sec)

希望您觉得这有用：）

Answer 3

正如Thilo所说，添加索引，而不是tbl_user_reports t使用tbl_user_reports AS t，我会将HAVING语句移到WHERE以减少操作量。

WHERE t.user_id IS NOT NULL AND t.bound_time BETWEEN '2011-02-01' AND '2011-02-28'

<强>更新出于实验目的，您可以尝试使用like而不是

t.bound_time LIKE '2011-02%'

查询需要太长时间才能完成

3 个答案: