我需要优化分析相当大的数据集的方式,而且我不确定接下来的步骤是什么。我做了一些MySQL配置调优。
我有这个InnoDB表:
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | int(250) | NO | PRI | NULL | auto_increment |
| memory | int(15) | YES | MUL | NULL | |
| q | varchar(250) | YES | MUL | NULL | |
| created | datetime | YES | | NULL | |
| modified | datetime | YES | | NULL | |
| dt | datetime | YES | MUL | NULL | |
| site_id | int(250) | NO | MUL | NULL | |
| execution_time | int(11) | YES | MUL | NULL | |
+----------------+--------------+------+-----+---------+----------------+
以下是10行的示例:
+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+
| id | memory | q | created | modified | dt | site_id | execution_time |
+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+
| 266864867 | 38011080 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:04:44 | 890 | 1534 |
| 266864868 | 46090184 | node/16432 | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:04:46 | 890 | 840 |
| 266864869 | 50329248 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:05:16 | 890 | 2500 |
| 266864870 | 38011272 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:01 | 890 | 1494 |
| 266864871 | 46087732 | node/16432 | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:03 | 890 | 850 |
| 266864872 | 30304428 | node/303 | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:12 | 890 | 113 |
| 266864873 | 50329412 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:25 | 890 | 2465 |
| 266864874 | 28253112 | front_page | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:07:25 | 890 | 86 |
| 266864875 | 28256044 | front_page | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:08:32 | 890 | 81 |
| 266864876 | 38021072 | node/16432/edit | 2011-12-05 23:22:23 | 2011-12-05 23:22:23 | 2011-12-06 00:08:55 | 890 | 1458 |
+-----------+----------+-----------------+---------------------+---------------------+---------------------+---------+----------------+
以下是表索引:
+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
| memories | 0 | PRIMARY | 1 | id | A | 8473766 | NULL | NULL | | BTREE | |
| memories | 1 | index_dt | 1 | dt | A | 1210538 | NULL | NULL | YES | BTREE | |
| memories | 1 | index_execution_time | 1 | execution_time | A | 2344 | NULL | NULL | YES | BTREE | |
| memories | 1 | index_memory | 1 | memory | A | 8473766 | NULL | NULL | YES | BTREE | |
| memories | 1 | index_site_id | 1 | site_id | A | 16 | NULL | NULL | | BTREE | |
| memories | 1 | index_q | 1 | q | A | 338950 | NULL | NULL | YES | BTREE | |
+----------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+
它为许多不同的站点(site_id)存储了超过一百万条记录。对于给定的站点,可能有20,000行。存储的信息是各个页面请求的性能指标。如果重要,非显而易见的字段:内存字段是脚本使用了多少内存,q是路径,site_id是对表格网站的引用。
我在这个数据上运行了两个慢查询。第一个得到25个最大的内存页面:
Select
Memory.q, count(*) as count,
AVG(Memory.memory) as average_memory,
MAX(Memory.memory) as peak_memory,
AVG(Memory.execution_time) as average_execution_time,
MAX(Memory.execution_time) as peak_execution_time
FROM Memory
WHERE site_id = $some_site_id
ORDER BY average_memory DESC
GROUP BY Memory.q
LIMIT 25
第二个查询获得给定网站的最慢平均25页:
Select
Memory.q, count(*) as count,
AVG(Memory.memory) as average_memory,
MAX(Memory.memory) as peak_memory,
AVG(Memory.execution_time) as average_execution_time,
MAX(Memory.execution_time) as peak_execution_time
FROM Memory
WHERE site_id = $some_site_id
ORDER BY average_execution_time DESC
GROUP BY Memory.q
LIMIT 25
我最近将表从MyISAM转换为InnoDB,因此这些读取不会锁定表。这导致更新此表的操作排队并延迟。
除了在问题上投入更多内存(增加InnoDB缓存大小),我想看看是否还有其他选项。我从未使用NoSQL数据库,但据我所知,他们在这里没有多大帮助,因为我使用了聚合函数和查询。
该应用程序是用PHP编写的,如果重要的话。
是否有更好的方法来处理这些数据的存储和分析?
更新
对查询进行概要分析显示复制到临时表的速度很慢。我将研究如何加快这一步。
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| starting | 0.000030 |
| checking query cache for query | 0.000065 |
| Opening tables | 0.000013 |
| System lock | 0.000004 |
| Table lock | 0.000014 |
| init | 0.000032 |
| optimizing | 0.000010 |
| statistics | 0.008119 |
| preparing | 0.000042 |
| Creating tmp table | 0.000317 |
| executing | 0.000005 |
| Copying to tmp table | 5.349280 |
| Sorting result | 0.006511 |
| Sending data | 0.000092 |
| end | 0.000005 |
| removing tmp table | 0.001510 |
| end | 0.000007 |
| query end | 0.000004 |
| freeing items | 0.001163 |
| logging slow query | 0.000006 |
| cleaning up | 0.000006 |
+--------------------------------+----------+
21 rows in set (0.01 sec)
mysql> show profile cpu for query 4;
+--------------------------------+----------+----------+------------+
| Status | Duration | CPU_user | CPU_system |
+--------------------------------+----------+----------+------------+
| starting | 0.000030 | 0.000000 | 0.000000 |
| checking query cache for query | 0.000065 | 0.000000 | 0.000000 |
| Opening tables | 0.000013 | 0.000000 | 0.000000 |
| System lock | 0.000004 | 0.000000 | 0.000000 |
| Table lock | 0.000014 | 0.000000 | 0.000000 |
| init | 0.000032 | 0.000000 | 0.000000 |
| optimizing | 0.000010 | 0.000000 | 0.000000 |
| statistics | 0.008119 | 0.001000 | 0.000000 |
| preparing | 0.000042 | 0.000000 | 0.000000 |
| Creating tmp table | 0.000317 | 0.000000 | 0.000000 |
| executing | 0.000005 | 0.000000 | 0.000000 |
| Copying to tmp table | 5.349280 | 0.687896 | 0.412937 |
| Sorting result | 0.006511 | 0.004999 | 0.001999 |
| Sending data | 0.000092 | 0.000000 | 0.000000 |
| end | 0.000005 | 0.000000 | 0.000000 |
| removing tmp table | 0.001510 | 0.000000 | 0.001000 |
| end | 0.000007 | 0.000000 | 0.000000 |
| query end | 0.000004 | 0.000000 | 0.000000 |
| freeing items | 0.001163 | 0.000000 | 0.001000 |
| logging slow query | 0.000006 | 0.000000 | 0.000000 |
| cleaning up | 0.000006 | 0.000000 | 0.000000 |
+--------------------------------+----------+----------+------------+
答案 0 :(得分:4)
您没有显示您的密钥结构,但它确实显示site_id
是多部分密钥(MUL
)的一部分。请注意,如果它不是该多部分键中的FIRST字段,则该键不能用于该where子句。例如,如果你有
KEY somekey (field1, site_id, field3, ...)
那么你的where子句必须包括两个field
和site_id
才能在查询中使用该键。您不必按照密钥中列出的相同顺序使用字段(where site_id=.. and field1=...
将与where field1=... and site_id=...
的工作方式相同),但由于field1出现在密钥定义中的site_id之前,因此必须使用它以使整个密钥可用。
同样适用于您的q
字段。它也必须是被覆盖的密钥中的第一个,或者这些密钥是不可用的。
答案 1 :(得分:3)
要有效地设计innodb表,您需要了解innodb如何使用索引 - 特别是聚簇索引是什么以及它们如何工作。
请花一些时间阅读以下文章和我以前的答案:
http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/
您也可以找到感兴趣的演示文稿:
所以现在你对innodb架构有了更好的理解,我们将看看如何针对innodb引擎优化你的模型。
由于您只提供了两个示例查询,我必须做出某些假设,因此以下设计针对覆盖site_id和path的查询进行了优化。我会留给你进一步修改设计(如果需要),因为你比我更了解你的数据。
我修改了你的设计并创建了3个表:site,site_request和site_request_metric。
drop table if exists site;
create table site
(
site_id smallint unsigned not null auto_increment primary key,
url varchar(255) unique not null,
next_request_id int unsigned not null default 0
)
engine=innodb;
select count(*) from site;
+----------+
| count(*) |
+----------+
| 1024 |
+----------+
+---------+------------------+-----------------+
| site_id | url | next_request_id |
+---------+------------------+-----------------+
| 1 | www.site1.com | 167 |
| 2 | www.site2.com | 177 |
| 3 | www.site3.com | 68 |
...
| 1022 | www.site1022.com | 203 |
| 1023 | www.site1023.com | 80 |
| 1024 | www.site1024.com | 239 |
+---------+------------------+-----------------+
上述大多数字段都是自解释的,但是next_request_id是一个计数器字段,用于记录给定网站有多少请求(示例中的路径或q)。例如,站点1024有239个单独的页面请求/路径,我们希望记录内存和执行指标。
还要注意我使用的数值数据类型 - 你的大多数都定义不清楚,因为你似乎混淆了可选的显示宽度说明符(仅用于zerofill)和整数的大小。选择可能的最小数据类型非常重要,这样我们就可以在innodb缓冲区中打包更多数据。
http://dev.mysql.com/doc/refman/5.0/en/integer-types.html
drop table if exists site_request;
create table site_request
(
site_id smallint unsigned not null,
request_id int unsigned not null,
created_date datetime not null,
path varchar(255) not null,
next_metric_id int unsigned not null default 0,
primary key (site_id, request_id)
)
engine=innodb;
select count(*) from site_request;
+----------+
| count(*) |
+----------+
| 192336 |
+----------+
+---------+------------+---------------------+----------------------+----------------+
| site_id | request_id | created_date | path | next_metric_id |
+---------+------------+---------------------+----------------------+----------------+
| 1 | 1 | 2011-12-14 17:17:41 | www.site1.com/1 | 250 |
| 1 | 2 | 2011-12-14 17:17:41 | www.site1.com/2 | 132 |
| 1 | 3 | 2011-12-14 17:17:41 | www.site1.com/3 | 345 |
...
| 1 | 166| 2011-12-14 17:17:41 | www.site1.com/166 | 342 |
| 1 | 167| 2011-12-14 17:17:41 | www.site1.com/167 | 231 |
...
| 1024 | 1 | 2011-12-14 17:17:58 | www.site1024.com/1 | 241 |
| 1024 | 2 | 2011-12-14 17:17:58 | www.site1024.com/2 | 266 |
...
| 1024 | 236 | 2011-12-14 17:17:58 | www.site1024.com/236 | 466 |
| 1024 | 237 | 2011-12-14 17:17:58 | www.site1024.com/237 | 459 |
| 1024 | 238 | 2011-12-14 17:17:58 | www.site1024.com/238 | 389 |
| 1024 | 239 | 2011-12-14 17:17:58 | www.site1024.com/239 | 592 |
+---------+------------+---------------------+----------------------+----------------+
大多数领域都是自我解释的。此表的主键是site_id和request_id的组合,因此站点1有167个单独的请求/路径,站点1024有239个。
要选择单个请求,您必须同时指定site_id和request_id:
select * from site_request where site_id = 1 and request_id = 167
+---------+------------+---------------------+-------------------+----------------+
| site_id | request_id | created_date | path | next_metric_id |
+---------+------------+---------------------+-------------------+----------------+
| 1 | 167 | 2011-12-14 17:17:41 | www.site1.com/167 | 231 |
+---------+------------+---------------------+-------------------+----------------+
1 row in set (0.00 sec)
select * from site_request where site_id = 1024 and request_id = 167
+---------+------------+---------------------+----------------------+----------------+
| site_id | request_id | created_date | path | next_metric_id |
+---------+------------+---------------------+----------------------+----------------+
| 1024 | 167 | 2011-12-14 17:17:58 | www.site1024.com/167 | 175 |
+---------+------------+---------------------+----------------------+----------------+
1 row in set (0.00 sec)
如果我想向网站添加新请求,我们使用site.next_request_id + 1为给定的site_id生成下一个复合主键值。这通常使用触发器完成,如下所示:
delimiter #
create trigger site_request_before_ins_trig before insert on site_request
for each row
begin
declare v_id int unsigned default 0;
select next_request_id + 1 into v_id from site where site_id = new.site_id;
set new.request_id = v_id, new.created_date = now();
update site set next_request_id = v_id where site_id = new.site_id;
end#
delimiter ;
为什么我只是在site_id上创建auto_increment主键和辅助索引?
create table site_request
(
request_id int unsigned not null auto_increment primary key,
site_id smallint unsigned not null,
...
key (site_id)
)
engine=innodb;
我已经假设您的大多数查询都包含site_id和path,因此在site_id上对请求表进行聚类是值得优化的,即使插入开销会略有增加。我更关心读取性能,特别是因为此表将在稍后与HUGE指标表连接。
drop table if exists site_request_metric;
create table site_request_metric
(
site_id smallint unsigned not null,
request_id int unsigned not null,
metric_id int unsigned not null,
created_date datetime not null,
memory_usage int unsigned not null default 0,
execution_time mediumint unsigned not null default 0,
primary key (site_id, request_id, metric_id)
)
engine=innodb;
select count(*) from site_request_metric;
+----------+
| count(*) |
+----------+
| 73858764 |
+----------+
+---------+------------+-----------+---------------------+--------------+----------------+
| site_id | request_id | metric_id | created_date | memory_usage | execution_time |
+---------+------------+-----------+---------------------+--------------+----------------+
| 1 | 1 | 1 | 2011-12-14 17:17:58 | 18052380 | 7731 |
| 1 | 1 | 2 | 2011-12-14 17:17:58 | 32013204 | 7881 |
| 1 | 1 | 3 | 2011-12-14 17:17:58 | 55779470 | 7274 |
...
| 1 | 1 | 249 | 2011-12-14 17:17:58 | 11527748 | 5126 |
| 1 | 1 | 248 | 2011-12-14 17:17:58 | 19457506 | 4097 |
| 1 | 1 | 247 | 2011-12-14 17:17:58 | 23129432 | 6202 |
...
| 997 | 1 | 1 | 2011-12-14 19:08:48 | 38584043 | 7156 |
| 997 | 1 | 2 | 2011-12-14 19:08:48 | 68884314 | 2185 |
| 997 | 1 | 3 | 2011-12-14 19:08:48 | 31545597 | 207 |
...
| 997 | 1 | 380 | 2011-12-14 19:08:49 | 39123978 | 166 |
| 997 | 1 | 381 | 2011-12-14 19:08:49 | 45114404 | 7310 |
| 997 | 1 | 382 | 2011-12-14 19:08:49 | 55057884 | 506 | +---------+------------+-----------+---------------------+--------------+----------------+
site_request_metric.next_metric_id字段的工作方式与site.next_request_id计数器字段类似,并使用触发器维护。
delimiter #
create trigger site_request_metric_before_ins_trig before insert on site_request_metric
for each row
begin
declare v_id int unsigned default 0;
select next_metric_id + 1 into v_id from site_request where site_id = new.site_id and request_id = new.request_id;
set new.metric_id = v_id, new.created_date = now();
update site_request set next_metric_id = v_id where site_id = new.site_id and request_id = new.request_id;
end#
delimiter ;
以网站997为例:
select * from site where site_id = 997;
+---------+-----------------+-----------------+
| site_id | url | next_request_id |
+---------+-----------------+-----------------+
| 997 | www.site997.com | 319 |
+---------+-----------------+-----------------+
1 row in set (0.00 sec)
网站997有319个单独的页面请求/路径。
select * from site_request where site_id = 997;
+---------+------------+---------------------+---------------------+----------------+
| site_id | request_id | created_date | path | next_metric_id |
+---------+------------+---------------------+---------------------+----------------+
| 997 | 1 | 2011-12-14 17:17:58 | www.site997.com/1 | 383 |
| 997 | 2 | 2011-12-14 17:17:58 | www.site997.com/2 | 262 |
| 997 | 3 | 2011-12-14 17:17:58 | www.site997.com/3 | 470 |
| 997 | 4 | 2011-12-14 17:17:58 | www.site997.com/4 | 247 |
...
| 997 | 316 | 2011-12-14 17:17:58 | www.site997.com/316 | 176 |
| 997 | 317 | 2011-12-14 17:17:58 | www.site997.com/317 | 441 |
| 997 | 318 | 2011-12-14 17:17:58 | www.site997.com/318 | 419 |
| 997 | 319 | 2011-12-14 17:17:58 | www.site997.com/319 | 601 |
+---------+------------+---------------------+---------------------+----------------+
319 rows in set (0.00 sec)
我们对网站997的所有请求有多少指标?
select sum(next_metric_id) from site_request where site_id = 997;
+---------------------+
| sum(next_metric_id) |
+---------------------+
| 130163 |
+---------------------+
1 row in set (0.00 sec)
总结此网站的next_metric_id(如上所述)比通常更快:
select count(*) from site_request_metric where site_id = 997;
+----------+
| count(*) |
+----------+
| 130163 |
+----------+
1 row in set (0.03 sec)
因此,网站997具有大约130K的内存和执行时间指标,可以在大约表中进行分析。 7400万行。
让我们接下来尝试我们的主要查询:(所有运行时都很冷,即重启mysql,空缓冲区,没有查询缓存!)
select
hog.*,
sr.path
from
(
select
srm.site_id,
srm.request_id,
count(*) as counter,
avg(srm.memory_usage) as average_memory,
max(srm.memory_usage) as peak_memory,
avg(srm.execution_time) as average_execution_time,
max(srm.execution_time) as peak_execution_time
from
site_request_metric srm
where
srm.site_id = 997
group by
srm.site_id,
srm.request_id
order by
average_memory desc
limit 25
) hog
inner join site_request sr on hog.site_id = sr.site_id and hog.request_id = sr.request_id;
结果如下:
+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
| site_id | request_id | counter | average_memory | peak_memory | average_execution_time | peak_execution_time | path |
+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
| 997 | 103 | 184 | 43381803.4293 | 69682361 | 4378.1630 | 8069 | www.site997.com/103 |
| 997 | 151 | 158 | 42594703.1392 | 69329761 | 4422.8481 | 8080 | www.site997.com/151 |
| 997 | 192 | 509 | 42470135.3360 | 69927112 | 4083.1198 | 8098 | www.site997.com/192 |
| 997 | 248 | 161 | 42169276.5590 | 69995565 | 4118.1180 | 7949 | www.site997.com/248 |
| 997 | 221 | 162 | 42156708.4877 | 69233026 | 4151.1667 | 8022 | www.site997.com/221 |
| 997 | 136 | 154 | 42026979.3831 | 69897045 | 4060.5649 | 8098 | www.site997.com/136 |
| 997 | 239 | 424 | 41979697.9788 | 69381215 | 4463.0189 | 8087 | www.site997.com/239 |
| 997 | 77 | 338 | 41864013.0266 | 69991164 | 3942.4142 | 8067 | www.site997.com/77 |
| 997 | 283 | 249 | 41853642.9157 | 69945794 | 3915.7028 | 8034 | www.site997.com/283 |
| 997 | 5 | 228 | 41815274.7851 | 69825743 | 3898.4123 | 8078 | www.site997.com/5 |
| 997 | 216 | 319 | 41766464.5078 | 69777901 | 3899.0752 | 8091 | www.site997.com/216 |
| 997 | 131 | 170 | 41720890.5118 | 69892577 | 4074.2588 | 8097 | www.site997.com/131 |
| 997 | 160 | 385 | 41702556.6545 | 69868379 | 4060.2727 | 8093 | www.site997.com/160 |
| 997 | 245 | 200 | 41683505.3900 | 69668739 | 4052.7950 | 8095 | www.site997.com/245 |
| 997 | 70 | 429 | 41640396.0466 | 69988619 | 3995.3310 | 8099 | www.site997.com/70 |
| 997 | 98 | 485 | 41553544.7649 | 69957698 | 4048.1443 | 8096 | www.site997.com/98 |
| 997 | 153 | 301 | 41542909.4651 | 69754024 | 3884.7409 | 8028 | www.site997.com/153 |
| 997 | 226 | 429 | 41523530.3939 | 69691453 | 4097.7226 | 8096 | www.site997.com/226 |
| 997 | 31 | 478 | 41442100.4435 | 69802248 | 3999.3096 | 8098 | www.site997.com/31 |
| 997 | 171 | 222 | 41405805.8153 | 69433643 | 4364.4414 | 8087 | www.site997.com/171 |
| 997 | 150 | 336 | 41393538.5744 | 69746950 | 4264.5655 | 8077 | www.site997.com/150 |
| 997 | 167 | 526 | 41391595.5741 | 69633242 | 4206.1597 | 8096 | www.site997.com/167 |
| 997 | 182 | 593 | 41288151.5379 | 69992913 | 4351.6476 | 8099 | www.site997.com/182 |
| 997 | 14 | 555 | 41239680.5387 | 69976632 | 4054.6126 | 8084 | www.site997.com/14 |
| 997 | 297 | 410 | 41163572.3805 | 69874576 | 4001.0829 | 8039 | www.site997.com/297 |
+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
25 rows in set (0.41 sec)
select
hog.*,
sr.path
from
(
select
srm.site_id,
srm.request_id,
count(*) as counter,
avg(srm.memory_usage) as average_memory,
max(srm.memory_usage) as peak_memory,
avg(srm.execution_time) as average_execution_time,
max(srm.execution_time) as peak_execution_time
from
site_request_metric srm
where
srm.site_id = 997
group by
srm.site_id,
srm.request_id
order by
average_execution_time desc
limit 25
) hog
inner join site_request sr on hog.site_id = sr.site_id and hog.request_id = sr.request_id;
结果如下:
+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
| site_id | request_id | counter | average_memory | peak_memory | average_execution_time | peak_execution_time | path |
+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
| 997 | 213 | 159 | 37962517.1321 | 67120491 | 4497.9119 | 8055 | www.site997.com/213 |
| 997 | 239 | 424 | 41979697.9788 | 69381215 | 4463.0189 | 8087 | www.site997.com/239 |
| 997 | 151 | 158 | 42594703.1392 | 69329761 | 4422.8481 | 8080 | www.site997.com/151 |
| 997 | 289 | 382 | 39227749.9869 | 69715783 | 4402.8927 | 8093 | www.site997.com/289 |
| 997 | 69 | 473 | 40099817.4715 | 69798587 | 4380.6850 | 8092 | www.site997.com/69 |
| 997 | 103 | 184 | 43381803.4293 | 69682361 | 4378.1630 | 8069 | www.site997.com/103 |
| 997 | 183 | 236 | 40111564.1356 | 69853507 | 4376.4280 | 8032 | www.site997.com/183 |
| 997 | 171 | 222 | 41405805.8153 | 69433643 | 4364.4414 | 8087 | www.site997.com/171 |
| 997 | 58 | 212 | 39289163.9057 | 69861740 | 4355.8396 | 8087 | www.site997.com/58 |
| 997 | 71 | 388 | 39895200.6108 | 69801188 | 4353.9639 | 8086 | www.site997.com/71 |
| 997 | 182 | 593 | 41288151.5379 | 69992913 | 4351.6476 | 8099 | www.site997.com/182 |
| 997 | 195 | 305 | 39780792.6066 | 69824981 | 4343.0295 | 8081 | www.site997.com/195 |
| 997 | 318 | 419 | 39860696.4415 | 69958266 | 4323.6420 | 8071 | www.site997.com/318 |
| 997 | 303 | 318 | 39357663.3899 | 69850523 | 4322.4686 | 8097 | www.site997.com/303 |
| 997 | 198 | 306 | 38990104.1699 | 69851817 | 4320.0621 | 8088 | www.site997.com/198 |
| 997 | 286 | 227 | 39654671.5859 | 69871305 | 4307.8811 | 8055 | www.site997.com/286 |
| 997 | 105 | 611 | 39055749.5008 | 69813117 | 4296.0802 | 8090 | www.site997.com/105 |
| 997 | 298 | 388 | 40150371.2474 | 69985665 | 4286.9716 | 8095 | www.site997.com/298 |
| 997 | 84 | 517 | 39520438.9497 | 69990404 | 4283.3578 | 8098 | www.site997.com/84 |
| 997 | 106 | 448 | 41099495.4018 | 69902616 | 4282.6094 | 8082 | www.site997.com/106 |
| 997 | 237 | 431 | 39017341.3387 | 69623443 | 4277.4872 | 8071 | www.site997.com/237 |
| 997 | 55 | 381 | 39603109.8294 | 69750984 | 4269.1969 | 8095 | www.site997.com/55 |
| 997 | 34 | 438 | 40697744.4087 | 69843517 | 4266.3288 | 8047 | www.site997.com/34 |
| 997 | 38 | 433 | 40169799.8291 | 69898182 | 4266.1663 | 8088 | www.site997.com/38 |
| 997 | 150 | 336 | 41393538.5744 | 69746950 | 4264.5655 | 8077 | www.site997.com/150 |
+---------+------------+---------+----------------+-------------+------------------------+---------------------+---------------------+
25 rows in set (0.30 sec)
对于包含大约的表的两个查询,这是一个0.5秒的低运行时间。 7400万行(后续运行时间约为0.06秒)
这个答案并不是一个明确的答案,因为还有许多其他因素会影响我没有考虑过的表和索引设计。但是,它应该为您提供一些有关如何简单的表/索引设计可以显着提高innodb查询性能的一些见解。
希望这会有所帮助:)
这里有完整的脚本:http://pastie.org/3022142
答案 2 :(得分:2)
我首先要使用内置的profiler
分析查询mysql> SET profiling = 1;
mysql> <your query>;
mysql> SHOW PROFILES;
mysql> SHOW PROFILE FOR QUERY <id of your query>;
mysql> SHOW PROFILE CPU FOR QUERY <id of your query>;
请注意,分析不是免费的,所以当网站可以处理它时,可能是在实时系统的副本上。
答案 3 :(得分:0)
我会添加另一个带有'q'的MD5哈希的字段,并使用该字段的值进行分组。
在varchar(250)上设置索引并按字段值分组不是一个好主意。
你需要一个综合索引(site_id,q_hash)
答案 4 :(得分:0)
如果我正确地阅读了您的问题(和评论),问题是这些查询会导致系统崩溃。
其他答案为您指明优化方向(修复您的指标,使用分析器等)。
另一个策略是设置复制,并对从属运行这些重载查询。主人会哼唱,写入binlog,一旦查询完成,奴隶就会赶上来。此设置允许您使用长时间运行的查询来锤击从站,而不会影响主站的写入性能。
答案 5 :(得分:0)
您真正需要的是支持您提出的查询的两个好的索引。
目前您拥有的索引还不够,因为仍然可以从表中检索数据以及MySQL查询优化器决定选择哪个索引。
@MarkB的答案理论上是你想要的(@MarkB为+1)。您只需要为任何给定的查询制作索引拟合条件:
WHERE
条款ORDER BY
条款GROUP BY
条款WHERE
,ORDER BY
或GROUP BY
)让我们进行第一次查询:
Select
Memory.q, count(*) as count,
AVG(Memory.memory) as average_memory,
MAX(Memory.memory) as peak_memory,
AVG(Memory.execution_time) as average_execution_time,
MAX(Memory.execution_time) as peak_execution_time
FROM Memory
WHERE site_id = $some_site_id
ORDER BY average_memory DESC
GROUP BY Memory.q
LIMIT 25
看看四个标准:
WHERE
只有一个值 [site_id] ORDER BY
将在WHERE
内订购, [average_memory] GROUP BY
将在ORDER BY
, [q] Brackets中的所有内容都是您按照显示的顺序放入索引的内容。这是索引:
ALTER TABLE Memory ADD INDEX siteid_q_mem_exectime_index
(site_id,q,memory,execution_time);
请注意average_memory
不是表格列。它源自memory
字段。
现在,对第二个查询执行相同的操作:
Select
Memory.q, count(*) as count,
AVG(Memory.memory) as average_memory,
MAX(Memory.memory) as peak_memory,
AVG(Memory.execution_time) as average_execution_time,
MAX(Memory.execution_time) as peak_execution_time
FROM Memory
WHERE site_id = $some_site_id
ORDER BY average_execution_time DESC
GROUP BY Memory.q
LIMIT 25
看看四个标准:
WHERE
只有一个值 [site_id] ORDER BY
将在WHERE
, [average_execution] GROUP BY
将在ORDER BY
, [q] 结果将是与以前相同的列集。因此,您不需要其他索引。
又来了:
ALTER TABLE Memory ADD INDEX siteid_q_mem_exectime_index
(site_id,q,memory,execution_time);
为什么这个指数如此重要?
ORDER BY
和GROUP BY
通常会触发临时表的内部排序操作。如果表格已正确编入索引,则在遍历索引时,数据已根据需要进行排序。以这种方式创建的索引称为&#34;覆盖索引&#34;。
以下是关于此主题的一些不错的链接。享受!!!
答案 6 :(得分:-1)
首先,我看到的,你必须避免GROUP BY
- 它需要大量的记忆。把它制成两个查询。另外,按照Marc B的建议创建索引。