Question

我需要优化分析相当大的数据集的方式，而且我不确定接下来的步骤是什么。我做了一些MySQL配置调优。

我有这个InnoDB表：

+----------------+--------------+------+-----+---------+----------------+
| Field          | Type         | Null | Key | Default | Extra          |
+----------------+--------------+------+-----+---------+----------------+
| id             | int(250)     | NO   | PRI | NULL    | auto_increment |
| memory         | int(15)      | YES  | MUL | NULL    |                |
| q              | varchar(250) | YES  | MUL | NULL    |                |
| created        | datetime     | YES  |     | NULL    |                |
| modified       | datetime     | YES  |     | NULL    |                |
| dt             | datetime     | YES  | MUL | NULL    |                |
| site_id        | int(250)     | NO   | MUL | NULL    |                |
| execution_time | int(11)      | YES  | MUL | NULL    |                |
+----------------+--------------+------+-----+---------+----------------+

以下是10行的示例：

+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+
| id        | memory   | q               | created             | modified            | dt                  | site_id | execution_time |
+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+
| 266864867 | 38011080 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:04:44 |     890 |           1534 |
| 266864868 | 46090184 | node/16432      | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:04:46 |     890 |            840 |
| 266864869 | 50329248 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:05:16 |     890 |           2500 |
| 266864870 | 38011272 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:01 |     890 |           1494 |
| 266864871 | 46087732 | node/16432      | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:03 |     890 |            850 |
| 266864872 | 30304428 | node/303        | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:12 |     890 |            113 |
| 266864873 | 50329412 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:25 |     890 |           2465 |
| 266864874 | 28253112 | front_page      | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:25 |     890 |             86 |
| 266864875 | 28256044 | front_page      | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:08:32 |     890 |             81 |
| 266864876 | 38021072 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:08:55 |     890 |           1458 |
+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+

以下是表索引：

+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table    | Non_unique | Key_name             | Seq_in_index | Column_name    | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
| memories |          0 | PRIMARY              |            1 | id             | A         |     8473766 |     NULL | NULL   |      | BTREE      |         |
| memories |          1 | index_dt             |            1 | dt             | A         |     1210538 |     NULL | NULL   | YES  | BTREE      |         |
| memories |          1 | index_execution_time |            1 | execution_time | A         |        2344 |     NULL | NULL   | YES  | BTREE      |         |
| memories |          1 | index_memory         |            1 | memory         | A         |     8473766 |     NULL | NULL   | YES  | BTREE      |         |
| memories |          1 | index_site_id        |            1 | site_id        | A         |          16 |     NULL | NULL   |      | BTREE      |         |
| memories |          1 | index_q              |            1 | q              | A         |      338950 |     NULL | NULL   | YES  | BTREE      |         |
+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+

它为许多不同的站点（site_id）存储了超过一百万条记录。对于给定的站点，可能有20,000行。存储的信息是各个页面请求的性能指标。如果重要，非显而易见的字段：内存字段是脚本使用了多少内存，q是路径，site_id是对表格网站的引用。

我在这个数据上运行了两个慢查询。第一个得到25个最大的内存页面：

Select 
  Memory.q, count(*) as count, 
  AVG(Memory.memory) as average_memory, 
  MAX(Memory.memory) as peak_memory,
  AVG(Memory.execution_time) as average_execution_time,
  MAX(Memory.execution_time) as peak_execution_time 
FROM Memory 
WHERE site_id = $some_site_id 
ORDER BY average_memory DESC 
GROUP BY Memory.q
LIMIT 25

第二个查询获得给定网站的最慢平均25页：

Select 
  Memory.q, count(*) as count, 
  AVG(Memory.memory) as average_memory, 
  MAX(Memory.memory) as peak_memory,
  AVG(Memory.execution_time) as average_execution_time,
  MAX(Memory.execution_time) as peak_execution_time 
FROM Memory 
WHERE site_id = $some_site_id 
ORDER BY average_execution_time DESC 
GROUP BY Memory.q
LIMIT 25

我最近将表从MyISAM转换为InnoDB，因此这些读取不会锁定表。这导致更新此表的操作排队并延迟。

除了在问题上投入更多内存（增加InnoDB缓存大小），我想看看是否还有其他选项。我从未使用NoSQL数据库，但据我所知，他们在这里没有多大帮助，因为我使用了聚合函数和查询。

该应用程序是用PHP编写的，如果重要的话。

是否有更好的方法来处理这些数据的存储和分析？

更新

对查询进行概要分析显示复制到临时表的速度很慢。我将研究如何加快这一步。

+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000030 |
| checking query cache for query | 0.000065 |
| Opening tables                 | 0.000013 |
| System lock                    | 0.000004 |
| Table lock                     | 0.000014 |
| init                           | 0.000032 |
| optimizing                     | 0.000010 |
| statistics                     | 0.008119 |
| preparing                      | 0.000042 |
| Creating tmp table             | 0.000317 |
| executing                      | 0.000005 |
| Copying to tmp table           | 5.349280 |
| Sorting result                 | 0.006511 |
| Sending data                   | 0.000092 |
| end                            | 0.000005 |
| removing tmp table             | 0.001510 |
| end                            | 0.000007 |
| query end                      | 0.000004 |
| freeing items                  | 0.001163 |
| logging slow query             | 0.000006 |
| cleaning up                    | 0.000006 |
+--------------------------------+----------+
21 rows in set (0.01 sec)

mysql> show profile cpu for query 4;
+--------------------------------+----------+----------+------------+
| Status                         | Duration | CPU_user | CPU_system |
+--------------------------------+----------+----------+------------+
| starting                       | 0.000030 | 0.000000 |   0.000000 |
| checking query cache for query | 0.000065 | 0.000000 |   0.000000 |
| Opening tables                 | 0.000013 | 0.000000 |   0.000000 |
| System lock                    | 0.000004 | 0.000000 |   0.000000 |
| Table lock                     | 0.000014 | 0.000000 |   0.000000 |
| init                           | 0.000032 | 0.000000 |   0.000000 |
| optimizing                     | 0.000010 | 0.000000 |   0.000000 |
| statistics                     | 0.008119 | 0.001000 |   0.000000 |
| preparing                      | 0.000042 | 0.000000 |   0.000000 |
| Creating tmp table             | 0.000317 | 0.000000 |   0.000000 |
| executing                      | 0.000005 | 0.000000 |   0.000000 |
| Copying to tmp table           | 5.349280 | 0.687896 |   0.412937 |
| Sorting result                 | 0.006511 | 0.004999 |   0.001999 |
| Sending data                   | 0.000092 | 0.000000 |   0.000000 |
| end                            | 0.000005 | 0.000000 |   0.000000 |
| removing tmp table             | 0.001510 | 0.000000 |   0.001000 |
| end                            | 0.000007 | 0.000000 |   0.000000 |
| query end                      | 0.000004 | 0.000000 |   0.000000 |
| freeing items                  | 0.001163 | 0.000000 |   0.001000 |
| logging slow query             | 0.000006 | 0.000000 |   0.000000 |
| cleaning up                    | 0.000006 | 0.000000 |   0.000000 |
+--------------------------------+----------+----------+------------+

Answer 1

您没有显示您的密钥结构，但它确实显示site_id是多部分密钥（MUL）的一部分。请注意，如果它不是该多部分键中的FIRST字段，则该键不能用于该where子句。例如，如果你有

KEY somekey (field1, site_id, field3, ...)

那么你的where子句必须包括两个field和site_id才能在查询中使用该键。您不必按照密钥中列出的相同顺序使用字段（where site_id=.. and field1=...将与where field1=... and site_id=...的工作方式相同），但由于field1出现在密钥定义中的site_id之前，因此必须使用它以使整个密钥可用。

同样适用于您的q字段。它也必须是被覆盖的密钥中的第一个，或者这些密钥是不可用的。

Answer 2

要有效地设计innodb表，您需要了解innodb如何使用索引 - 特别是聚簇索引是什么以及它们如何工作。

背景阅读

请花一些时间阅读以下文章和我以前的答案：

您也可以找到感兴趣的演示文稿：

http://vimeo.com/20990641

所以现在你对innodb架构有了更好的理解，我们将看看如何针对innodb引擎优化你的模型。

由于您只提供了两个示例查询，我必须做出某些假设，因此以下设计针对覆盖site_id和path的查询进行了优化。我会留给你进一步修改设计（如果需要），因为你比我更了解你的数据。

修订架构（简化）

我修改了你的设计并创建了3个表：site，site_request和site_request_metric。

站点表（1024行）

drop table if exists site;
create table site
(
site_id smallint unsigned not null auto_increment primary key,
url varchar(255) unique not null,
next_request_id int unsigned not null default 0
)
engine=innodb;

select count(*) from site;
+----------+
| count(*) |
+----------+
|     1024 |
+----------+

站点表 - 示例数据

+---------+------------------+-----------------+
| site_id | url              | next_request_id |
+---------+------------------+-----------------+
|       1 | www.site1.com    |             167 |
|       2 | www.site2.com    |             177 |
|       3 | www.site3.com    |              68 |
...
|    1022 | www.site1022.com |             203 |
|    1023 | www.site1023.com |              80 |
|    1024 | www.site1024.com |             239 |
+---------+------------------+-----------------+

上述大多数字段都是自解释的，但是next_request_id是一个计数器字段，用于记录给定网站有多少请求（示例中的路径或q）。例如，站点1024有239个单独的页面请求/路径，我们希望记录内存和执行指标。

还要注意我使用的数值数据类型 - 你的大多数都定义不清楚，因为你似乎混淆了可选的显示宽度说明符（仅用于zerofill）和整数的大小。选择可能的最小数据类型非常重要，这样我们就可以在innodb缓冲区中打包更多数据。

http://dev.mysql.com/doc/refman/5.0/en/integer-types.html

站点请求表（192K行）

drop table if exists site_request;
create table site_request
(
site_id smallint unsigned not null,
request_id int unsigned not null,
created_date datetime not null,
path varchar(255) not null,
next_metric_id int unsigned not null default 0,
primary key (site_id, request_id)
)
engine=innodb;

select count(*) from site_request;
+----------+
| count(*) |
+----------+
|   192336 |
+----------+

网站申请表 - 样本数据

+---------+------------+---------------------+----------------------+----------------+
| site_id | request_id | created_date        | path                 | next_metric_id |
+---------+------------+---------------------+----------------------+----------------+
|       1 |          1 | 2011-12-14 17:17:41 | www.site1.com/1      |            250 |
|       1 |          2 | 2011-12-14 17:17:41 | www.site1.com/2      |            132 |
|       1 |          3 | 2011-12-14 17:17:41 | www.site1.com/3      |            345 |
...
|       1 |         166| 2011-12-14 17:17:41 | www.site1.com/166    |            342 |
|       1 |         167| 2011-12-14 17:17:41 | www.site1.com/167    |            231 |
...
|    1024 |          1 | 2011-12-14 17:17:58 | www.site1024.com/1   |            241 |
|    1024 |          2 | 2011-12-14 17:17:58 | www.site1024.com/2   |            266 |
...
|    1024 |        236 | 2011-12-14 17:17:58 | www.site1024.com/236 |            466 |
|    1024 |        237 | 2011-12-14 17:17:58 | www.site1024.com/237 |            459 |
|    1024 |        238 | 2011-12-14 17:17:58 | www.site1024.com/238 |            389 |
|    1024 |        239 | 2011-12-14 17:17:58 | www.site1024.com/239 |            592 |
+---------+------------+---------------------+----------------------+----------------+

大多数领域都是自我解释的。此表的主键是site_id和request_id的组合，因此站点1有167个单独的请求/路径，站点1024有239个。

要选择单个请求，您必须同时指定site_id和request_id：

select * from site_request where site_id = 1 and request_id = 167
+---------+------------+---------------------+-------------------+----------------+
| site_id | request_id | created_date        | path              | next_metric_id |
+---------+------------+---------------------+-------------------+----------------+
|       1 |        167 | 2011-12-14 17:17:41 | www.site1.com/167 |            231 |
+---------+------------+---------------------+-------------------+----------------+
1 row in set (0.00 sec)

select * from site_request where site_id = 1024 and request_id = 167
+---------+------------+---------------------+----------------------+----------------+
| site_id | request_id | created_date        | path                 | next_metric_id |
+---------+------------+---------------------+----------------------+----------------+
|    1024 |        167 | 2011-12-14 17:17:58 | www.site1024.com/167 |            175 |
+---------+------------+---------------------+----------------------+----------------+
1 row in set (0.00 sec)

如果我想向网站添加新请求，我们使用site.next_request_id + 1为给定的site_id生成下一个复合主键值。这通常使用触发器完成，如下所示：

delimiter #

create trigger site_request_before_ins_trig before insert on site_request
for each row
begin
declare v_id int unsigned default 0;

  select next_request_id + 1 into v_id from site where site_id = new.site_id;
  set new.request_id = v_id, new.created_date = now();
  update site set next_request_id = v_id where site_id = new.site_id;
end#

delimiter ;

为什么我只是在site_id上创建auto_increment主键和辅助索引？

create table site_request
(
request_id int unsigned not null auto_increment primary key,
site_id smallint unsigned not null,
...
key (site_id)
)
engine=innodb;

我已经假设您的大多数查询都包含site_id和path，因此在site_id上对请求表进行聚类是值得优化的，即使插入开销会略有增加。我更关心读取性能，特别是因为此表将在稍后与HUGE指标表连接。

站点请求度量表（7400万行）

drop table if exists site_request_metric;
create table site_request_metric
(
site_id smallint unsigned not null,
request_id int unsigned not null,
metric_id int unsigned not null,
created_date datetime not null,
memory_usage int unsigned not null default 0,
execution_time mediumint unsigned not null default 0,
primary key (site_id, request_id, metric_id)
)
engine=innodb;

select count(*) from site_request_metric;
+----------+
| count(*) |
+----------+
| 73858764 |
+----------+

站点请求度量标准表 - 样本数据

+---------+------------+-----------+---------------------+--------------+----------------+
| site_id | request_id | metric_id | created_date        | memory_usage | execution_time |
+---------+------------+-----------+---------------------+--------------+----------------+
|       1 |          1 |         1 | 2011-12-14 17:17:58 |     18052380 |       7731 |
|       1 |          1 |         2 | 2011-12-14 17:17:58 |     32013204 |       7881 |
|       1 |          1 |         3 | 2011-12-14 17:17:58 |     55779470 |       7274 |
...
|       1 |          1 |       249 | 2011-12-14 17:17:58 |     11527748 |       5126 |
|       1 |          1 |       248 | 2011-12-14 17:17:58 |     19457506 |       4097 |
|       1 |          1 |       247 | 2011-12-14 17:17:58 |     23129432 |       6202 |
...
|     997 |          1 |         1 | 2011-12-14 19:08:48 |     38584043 |       7156 |
|     997 |          1 |         2 | 2011-12-14 19:08:48 |     68884314 |       2185 |
|     997 |          1 |         3 | 2011-12-14 19:08:48 |     31545597 |        207 |
...
|     997 |          1 |       380 | 2011-12-14 19:08:49 |     39123978 |        166 |
|     997 |          1 |       381 | 2011-12-14 19:08:49 |     45114404 |       7310 |
|     997 |          1 |       382 | 2011-12-14 19:08:49 |     55057884 |        506 |    +---------+------------+-----------+---------------------+--------------+----------------+

site_request_metric.next_metric_id字段的工作方式与site.next_request_id计数器字段类似，并使用触发器维护。

delimiter #

create trigger site_request_metric_before_ins_trig before insert on site_request_metric
for each row
begin
declare v_id int unsigned default 0;

  select next_metric_id + 1 into v_id from site_request where site_id = new.site_id and request_id = new.request_id;
  set new.metric_id = v_id, new.created_date = now();
  update site_request set next_metric_id = v_id where site_id = new.site_id and request_id = new.request_id;
end#

delimiter ;

架构性能

以网站997为例：

select * from site where site_id = 997;
+---------+-----------------+-----------------+
| site_id | url             | next_request_id |
+---------+-----------------+-----------------+
|     997 | www.site997.com |             319 |
+---------+-----------------+-----------------+
1 row in set (0.00 sec)

网站997有319个单独的页面请求/路径。

select * from site_request where site_id = 997;
+---------+------------+---------------------+---------------------+----------------+
| site_id | request_id | created_date        | path                | next_metric_id |
+---------+------------+---------------------+---------------------+----------------+
|     997 |          1 | 2011-12-14 17:17:58 | www.site997.com/1   |            383 |
|     997 |          2 | 2011-12-14 17:17:58 | www.site997.com/2   |            262 |
|     997 |          3 | 2011-12-14 17:17:58 | www.site997.com/3   |            470 |
|     997 |          4 | 2011-12-14 17:17:58 | www.site997.com/4   |            247 |
...
|     997 |        316 | 2011-12-14 17:17:58 | www.site997.com/316 |            176 |
|     997 |        317 | 2011-12-14 17:17:58 | www.site997.com/317 |            441 |
|     997 |        318 | 2011-12-14 17:17:58 | www.site997.com/318 |            419 |
|     997 |        319 | 2011-12-14 17:17:58 | www.site997.com/319 |            601 |
+---------+------------+---------------------+---------------------+----------------+
319 rows in set (0.00 sec)

我们对网站997的所有请求有多少指标？

select sum(next_metric_id) from site_request where site_id = 997;
+---------------------+
| sum(next_metric_id) |
+---------------------+
|              130163 |
+---------------------+
1 row in set (0.00 sec)

总结此网站的next_metric_id（如上所述）比通常更快：

select count(*) from site_request_metric where site_id = 997;
+----------+
| count(*) |
+----------+
|   130163 |
+----------+
1 row in set (0.03 sec)

因此，网站997具有大约130K的内存和执行时间指标，可以在大约表中进行分析。 7400万行。

让我们接下来尝试我们的主要查询:(所有运行时都很冷，即重启mysql，空缓冲区，没有查询缓存！）

内存

select
 hog.*,
 sr.path
from
(
select 
 srm.site_id,
 srm.request_id,
 count(*) as counter, 
 avg(srm.memory_usage) as average_memory, 
 max(srm.memory_usage) as peak_memory,
 avg(srm.execution_time) as average_execution_time,
 max(srm.execution_time) as peak_execution_time 
from
 site_request_metric srm
where
 srm.site_id = 997
group by 
 srm.site_id,
 srm.request_id
order by
 average_memory desc
limit 25
) hog
inner join site_request sr on hog.site_id = sr.site_id and hog.request_id = sr.request_id;

结果如下：

+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
| site_id | request_id | counter | average_memory | peak_memory | average_execution_time | peak_execution_time | path                |
+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
|     997 |        103 |     184 |  43381803.4293 |    69682361 |              4378.1630 |                8069 | www.site997.com/103 |
|     997 |        151 |     158 |  42594703.1392 |    69329761 |              4422.8481 |                8080 | www.site997.com/151 |
|     997 |        192 |     509 |  42470135.3360 |    69927112 |              4083.1198 |                8098 | www.site997.com/192 |
|     997 |        248 |     161 |  42169276.5590 |    69995565 |              4118.1180 |                7949 | www.site997.com/248 |
|     997 |        221 |     162 |  42156708.4877 |    69233026 |              4151.1667 |                8022 | www.site997.com/221 |
|     997 |        136 |     154 |  42026979.3831 |    69897045 |              4060.5649 |                8098 | www.site997.com/136 |
|     997 |        239 |     424 |  41979697.9788 |    69381215 |              4463.0189 |                8087 | www.site997.com/239 |
|     997 |         77 |     338 |  41864013.0266 |    69991164 |              3942.4142 |                8067 | www.site997.com/77  |
|     997 |        283 |     249 |  41853642.9157 |    69945794 |              3915.7028 |                8034 | www.site997.com/283 |
|     997 |          5 |     228 |  41815274.7851 |    69825743 |              3898.4123 |                8078 | www.site997.com/5   |
|     997 |        216 |     319 |  41766464.5078 |    69777901 |              3899.0752 |                8091 | www.site997.com/216 |
|     997 |        131 |     170 |  41720890.5118 |    69892577 |              4074.2588 |                8097 | www.site997.com/131 |
|     997 |        160 |     385 |  41702556.6545 |    69868379 |              4060.2727 |                8093 | www.site997.com/160 |
|     997 |        245 |     200 |  41683505.3900 |    69668739 |              4052.7950 |                8095 | www.site997.com/245 |
|     997 |         70 |     429 |  41640396.0466 |    69988619 |              3995.3310 |                8099 | www.site997.com/70  |
|     997 |         98 |     485 |  41553544.7649 |    69957698 |              4048.1443 |                8096 | www.site997.com/98  |
|     997 |        153 |     301 |  41542909.4651 |    69754024 |              3884.7409 |                8028 | www.site997.com/153 |
|     997 |        226 |     429 |  41523530.3939 |    69691453 |              4097.7226 |                8096 | www.site997.com/226 |
|     997 |         31 |     478 |  41442100.4435 |    69802248 |              3999.3096 |                8098 | www.site997.com/31  |
|     997 |        171 |     222 |  41405805.8153 |    69433643 |              4364.4414 |                8087 | www.site997.com/171 |
|     997 |        150 |     336 |  41393538.5744 |    69746950 |              4264.5655 |                8077 | www.site997.com/150 |
|     997 |        167 |     526 |  41391595.5741 |    69633242 |              4206.1597 |                8096 | www.site997.com/167 |
|     997 |        182 |     593 |  41288151.5379 |    69992913 |              4351.6476 |                8099 | www.site997.com/182 |
|     997 |         14 |     555 |  41239680.5387 |    69976632 |              4054.6126 |                8084 | www.site997.com/14  |
|     997 |        297 |     410 |  41163572.3805 |    69874576 |              4001.0829 |                8039 | www.site997.com/297 |
+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
25 rows in set (0.41 sec)

执行时间

select
 hog.*,
 sr.path
from
(
select 
 srm.site_id,
 srm.request_id,
 count(*) as counter, 
 avg(srm.memory_usage) as average_memory, 
 max(srm.memory_usage) as peak_memory,
 avg(srm.execution_time) as average_execution_time,
 max(srm.execution_time) as peak_execution_time 
from
 site_request_metric srm
where
 srm.site_id = 997
group by 
 srm.site_id,
 srm.request_id
order by
 average_execution_time desc
limit 25
) hog
inner join site_request sr on hog.site_id = sr.site_id and hog.request_id = sr.request_id;

结果如下：

+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
| site_id | request_id | counter | average_memory | peak_memory | average_execution_time | peak_execution_time | path                |
+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
|     997 |        213 |     159 |  37962517.1321 |    67120491 |              4497.9119 |                8055 | www.site997.com/213 |
|     997 |        239 |     424 |  41979697.9788 |    69381215 |              4463.0189 |                8087 | www.site997.com/239 |
|     997 |        151 |     158 |  42594703.1392 |    69329761 |              4422.8481 |                8080 | www.site997.com/151 |
|     997 |        289 |     382 |  39227749.9869 |    69715783 |              4402.8927 |                8093 | www.site997.com/289 |
|     997 |         69 |     473 |  40099817.4715 |    69798587 |              4380.6850 |                8092 | www.site997.com/69  |
|     997 |        103 |     184 |  43381803.4293 |    69682361 |              4378.1630 |                8069 | www.site997.com/103 |
|     997 |        183 |     236 |  40111564.1356 |    69853507 |              4376.4280 |                8032 | www.site997.com/183 |
|     997 |        171 |     222 |  41405805.8153 |    69433643 |              4364.4414 |                8087 | www.site997.com/171 |
|     997 |         58 |     212 |  39289163.9057 |    69861740 |              4355.8396 |                8087 | www.site997.com/58  |
|     997 |         71 |     388 |  39895200.6108 |    69801188 |              4353.9639 |                8086 | www.site997.com/71  |
|     997 |        182 |     593 |  41288151.5379 |    69992913 |              4351.6476 |                8099 | www.site997.com/182 |
|     997 |        195 |     305 |  39780792.6066 |    69824981 |              4343.0295 |                8081 | www.site997.com/195 |
|     997 |        318 |     419 |  39860696.4415 |    69958266 |              4323.6420 |                8071 | www.site997.com/318 |
|     997 |        303 |     318 |  39357663.3899 |    69850523 |              4322.4686 |                8097 | www.site997.com/303 |
|     997 |        198 |     306 |  38990104.1699 |    69851817 |              4320.0621 |                8088 | www.site997.com/198 |
|     997 |        286 |     227 |  39654671.5859 |    69871305 |              4307.8811 |                8055 | www.site997.com/286 |
|     997 |        105 |     611 |  39055749.5008 |    69813117 |              4296.0802 |                8090 | www.site997.com/105 |
|     997 |        298 |     388 |  40150371.2474 |    69985665 |              4286.9716 |                8095 | www.site997.com/298 |
|     997 |         84 |     517 |  39520438.9497 |    69990404 |              4283.3578 |                8098 | www.site997.com/84  |
|     997 |        106 |     448 |  41099495.4018 |    69902616 |              4282.6094 |                8082 | www.site997.com/106 |
|     997 |        237 |     431 |  39017341.3387 |    69623443 |              4277.4872 |                8071 | www.site997.com/237 |
|     997 |         55 |     381 |  39603109.8294 |    69750984 |              4269.1969 |                8095 | www.site997.com/55  |
|     997 |         34 |     438 |  40697744.4087 |    69843517 |              4266.3288 |                8047 | www.site997.com/34  |
|     997 |         38 |     433 |  40169799.8291 |    69898182 |              4266.1663 |                8088 | www.site997.com/38  |
|     997 |        150 |     336 |  41393538.5744 |    69746950 |              4264.5655 |                8077 | www.site997.com/150 |
+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
25 rows in set (0.30 sec)

对于包含大约的表的两个查询，这是一个0.5秒的低运行时间。 7400万行（后续运行时间约为0.06秒）

这个答案并不是一个明确的答案，因为还有许多其他因素会影响我没有考虑过的表和索引设计。但是，它应该为您提供一些有关如何简单的表/索引设计可以显着提高innodb查询性能的一些见解。

希望这会有所帮助：）

这里有完整的脚本：http://pastie.org/3022142

Answer 3

我首先要使用内置的profiler

分析查询

mysql> SET profiling = 1;
mysql> <your query>;
mysql> SHOW PROFILES;
mysql> SHOW PROFILE FOR QUERY <id of your query>;
mysql> SHOW PROFILE CPU FOR QUERY <id of your query>;

请注意，分析不是免费的，所以当网站可以处理它时，可能是在实时系统的副本上。

Answer 4

我会添加另一个带有'q'的MD5哈希的字段，并使用该字段的值进行分组。

在varchar（250）上设置索引并按字段值分组不是一个好主意。

你需要一个综合索引（site_id，q_hash）

Answer 5

如果我正确地阅读了您的问题（和评论），问题是这些查询会导致系统崩溃。

其他答案为您指明优化方向（修复您的指标，使用分析器等）。

另一个策略是设置复制，并对从属运行这些重载查询。主人会哼唱，写入binlog，一旦查询完成，奴隶就会赶上来。此设置允许您使用长时间运行的查询来锤击从站，而不会影响主站的写入性能。

Answer 6

您真正需要的是支持您提出的查询的两个好的索引。

目前您拥有的索引还不够，因为仍然可以从表中检索数据以及MySQL查询优化器决定选择哪个索引。

@MarkB的答案理论上是你想要的（@MarkB为+1）。您只需要为任何给定的查询制作索引拟合条件：

WHERE条款
ORDER BY条款
GROUP BY条款
必要列（不在WHERE，ORDER BY或GROUP BY）

让我们进行第一次查询：

Select  
  Memory.q, count(*) as count,  
  AVG(Memory.memory) as average_memory,  
  MAX(Memory.memory) as peak_memory, 
  AVG(Memory.execution_time) as average_execution_time, 
  MAX(Memory.execution_time) as peak_execution_time  
FROM Memory  
WHERE site_id = $some_site_id  
ORDER BY average_memory DESC  
GROUP BY Memory.q 
LIMIT 25

看看四个标准：

WHERE只有一个值 [site_id]
ORDER BY将在WHERE内订购， [average_memory] 
GROUP BY将在ORDER BY， [q]
必要列： [记忆]，[execution_time]

Brackets中的所有内容都是您按照显示的顺序放入索引的内容。这是索引：

ALTER TABLE Memory ADD INDEX siteid_q_mem_exectime_index
(site_id,q,memory,execution_time);

请注意average_memory不是表格列。它源自memory字段。

现在，对第二个查询执行相同的操作：

Select       
  Memory.q, count(*) as count,       
  AVG(Memory.memory) as average_memory,       
  MAX(Memory.memory) as peak_memory,      
  AVG(Memory.execution_time) as average_execution_time,      
  MAX(Memory.execution_time) as peak_execution_time       
FROM Memory       
WHERE site_id = $some_site_id       
ORDER BY average_execution_time DESC       
GROUP BY Memory.q      
LIMIT 25

看看四个标准：

WHERE只有一个值 [site_id]
ORDER BY将在WHERE， [average_execution]
GROUP BY将在ORDER BY， [q]
必要列： [记忆]，[execution_time]

结果将是与以前相同的列集。因此，您不需要其他索引。

又来了：

ALTER TABLE Memory ADD INDEX siteid_q_mem_exectime_index
(site_id,q,memory,execution_time);

为什么这个指数如此重要？

ORDER BY和GROUP BY通常会触发临时表的内部排序操作。如果表格已正确编入索引，则在遍历索引时，数据已根据需要进行排序。
必要的列（内存，** execution_time **）在索引中是有充分理由的。如果索引包含结果集所需的每一列，MySQL将不会触及该表。它只会从索引中读取所需的数据。这会减少磁盘I / O.

以这种方式创建的索引称为＆＃34;覆盖索引＆＃34;。

以下是关于此主题的一些不错的链接。享受!!!

Answer 7

首先，我看到的，你必须避免GROUP BY - 它需要大量的记忆。把它制成两个查询。另外，按照Marc B的建议创建索引。

如何提高此数据分析的速度？

7 个答案: