更奇怪的MySQL行为 - 查询优化帮助

时间:2011-05-26 02:55:49

标签: mysql query-optimization

我们有一个中央登录,用于支持多个网站。为了存储我们用户的数据,我们有一个accounts表,用于存储每个用户帐户,然后是每个站点的users表,以获取站点特定信息。我们还有一个简单的connections表,用于存储用户之间的连接。

我们注意到一个正在加入主键user_id上的表的查询执行缓慢。我希望那里的一些SQL专家可以解释为什么它使用WHERE来搜索users_site1表并建议我们如何优化它。这是慢查询&解释结果:

mysql> explain select a.username,a.first_name,a.last_name,a.organization_name,a.organization,a.city,a.state,a.zip,a.country,a.profile_photo,a.facebook_id,a.twitter_id,u.reviews from accounts a join users_site1 u ON a.user_id=u.user_id where a.user_id IN (select cid2 from connections where cid1=10001006 AND type="MM" AND status="A") OR a.user_id IN (select cid1 from connections where cid2=10001006 AND type="MM" AND status="A") order by RAND() LIMIT 4;
+----+--------------------+-------------+--------+-------------------+---------+---------+-----------------------+-------+----------------------------------------------+
| id | select_type        | table       | type   | possible_keys     | key     | key_len | ref                   | rows  | Extra                                        |
+----+--------------------+-------------+--------+-------------------+---------+---------+-----------------------+-------+----------------------------------------------+
|  1 | PRIMARY            | u           | ALL    | PRIMARY           | NULL    | NULL    | NULL                  | 79783 | Using where; Using temporary; Using filesort |
|  1 | PRIMARY            | a           | eq_ref | PRIMARY           | PRIMARY | 4       | exampledb.u.user_id |     1 |                                              |
|  3 | DEPENDENT SUBQUERY | connections | ref    | PRIMARY,cid1,cid2 | cid2    | 6       | const,const           |     2 | Using where                                  |
|  2 | DEPENDENT SUBQUERY | connections | ref    | PRIMARY,cid1,cid2 | cid1    | 6       | const,const           |     1 | Using where                                  |
+----+--------------------+-------------+--------+-------------------+---------+---------+-----------------------+-------+----------------------------------------------+
4 rows in set (0.00 sec)

以下是每个表的定义:

CREATE TABLE `accounts` (
  `user_id` int(9) unsigned NOT NULL AUTO_INCREMENT,
  `username` varchar(40) DEFAULT NULL,
  `facebook_id` bigint(15) unsigned DEFAULT NULL,
  `facebook_username` varchar(30) DEFAULT NULL,
  `password` varchar(20) DEFAULT NULL,
  `profile_photo` varchar(100) DEFAULT NULL,
  `first_name` varchar(40) DEFAULT NULL,
  `middle_name` varchar(40) DEFAULT NULL,
  `last_name` varchar(40) DEFAULT NULL,
  `suffix_name` char(3) DEFAULT NULL,
  `organization_name` varchar(100) DEFAULT NULL,
  `organization` tinyint(1) unsigned DEFAULT NULL,
  `address` varchar(200) DEFAULT NULL,
  `city` varchar(40) DEFAULT NULL,
  `state` varchar(20) DEFAULT NULL,
  `zip` varchar(10) DEFAULT NULL,
  `province` varchar(40) DEFAULT NULL,
  `country` int(3) DEFAULT NULL,
  `latitude` decimal(11,7) DEFAULT NULL,
  `longitude` decimal(12,7) DEFAULT NULL,
  `phone` varchar(20) DEFAULT NULL,
  `sex` char(1) DEFAULT NULL,
  `birthday` date DEFAULT NULL,
  `about_me` varchar(2000) DEFAULT NULL,
  `activities` varchar(300) DEFAULT NULL,
  `website` varchar(100) DEFAULT NULL,
  `email` varchar(150) DEFAULT NULL,
  `referrer` int(4) unsigned DEFAULT NULL,
  `referredid` int(9) unsigned DEFAULT NULL,
  `verify` int(6) DEFAULT NULL,
  `status` char(1) DEFAULT 'R',
  `created` datetime DEFAULT NULL,
  `verified` datetime DEFAULT NULL,
  `activated` datetime DEFAULT NULL,
  `network` datetime DEFAULT NULL,
  `deleted` datetime DEFAULT NULL,
  `logins` int(6) unsigned DEFAULT '0',
  `api_logins` int(6) unsigned DEFAULT '0',
  `last_login` datetime DEFAULT NULL,
  `last_update` datetime DEFAULT NULL,
  `private` tinyint(1) unsigned DEFAULT NULL,
  `ip` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`user_id`),
  UNIQUE KEY `username` (`username`),
  KEY `facebook_id` (`facebook_id`),
  KEY `status` (`status`),
  KEY `state` (`state`)
);

CREATE TABLE `users_site1` (
  `user_id` int(9) unsigned NOT NULL,
  `facebook_id` bigint(15) unsigned DEFAULT NULL,
  `facebook_username` varchar(30) DEFAULT NULL,
  `facebook_publish` tinyint(1) unsigned DEFAULT NULL,
  `facebook_checkin` tinyint(1) unsigned DEFAULT NULL,
  `facebook_offline` varchar(300) DEFAULT NULL,
  `twitter_id` varchar(60) DEFAULT NULL,
  `twitter_secret` varchar(50) DEFAULT NULL,
  `twitter_username` varchar(20) DEFAULT NULL,
  `type` char(1) DEFAULT 'M',
  `referrer` int(4) unsigned DEFAULT NULL,
  `referredid` int(9) unsigned DEFAULT NULL,
  `session` varchar(60) DEFAULT NULL,
  `api_session` varchar(60) DEFAULT NULL,
  `status` char(1) DEFAULT 'R',
  `created` datetime DEFAULT NULL,
  `verified` datetime DEFAULT NULL,
  `activated` datetime DEFAULT NULL,
  `deleted` datetime DEFAULT NULL,
  `logins` int(6) unsigned DEFAULT '0',
  `api_logins` int(6) unsigned DEFAULT '0',
  `last_login` datetime DEFAULT NULL,
  `last_update` datetime DEFAULT NULL,
  `ip` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`user_id`)
);

CREATE TABLE `connections` (
  `cid1` int(9) unsigned NOT NULL DEFAULT '0',
  `cid2` int(9) unsigned NOT NULL DEFAULT '0',
  `cid3` int(9) unsigned NOT NULL DEFAULT '0',
  `type` char(2) NOT NULL,
  `status` char(1) NOT NULL,
  `created` datetime DEFAULT NULL,
  `updated` datetime DEFAULT NULL,
  PRIMARY KEY (`cid1`,`cid2`,`type`,`cid3`),
  KEY `cid1` (`cid1`,`type`),
  KEY `cid2` (`cid2`,`type`)
);

3 个答案:

答案 0 :(得分:2)

而不是WHERE a.userid IN( ... ) OR a.userid IN( ... )你应该使用另一个联接:

select 
a.username,a.first_name,a.last_name,a.organization_name,a.organization,a.city,
a.state,a.zip,a.country,a.profile_photo,a.facebook_id,a.twitter_id,u.reviews 
from accounts a 
join users_site1 u ON a.user_id=u.user_id 
join ( select cid2 as id from connections 
       where cid1=10001006 AND type="MM" AND status="A"
       union
       select cid1 as id from connections
       where cid2=10001006 AND type="MM" AND status="A" ) c
on a.user_id = c.id
order by RAND() LIMIT 4;

答案 1 :(得分:0)

您是否尝试删除order by RAND()并再次运行?

我的结果如下:

+----+--------------------+-------------+----------------+-------------------+---------+---------+------------------+------+----------------------------------------------+
| id | select_type        | table       | type           | possible_keys     | key     | key_len | ref              | rows | Extra                                        |
+----+--------------------+-------------+----------------+-------------------+---------+---------+------------------+------+----------------------------------------------+
|  1 | PRIMARY            | a           | ALL            | PRIMARY           | NULL    | NULL    | NULL             | 2    | Using where; Using temporary; Using filesort |
|  1 | PRIMARY            | u           | ALL            | PRIMARY           | NULL    | NULL    | NULL             | 2    | Using where; Using join buffer               |
|  3 | DEPENDENT SUBQUERY | connections | index_subquery | PRIMARY,cid1,cid2 | PRIMARY | 14      | func,const,const | 1    | Using where                                  |
|  2 | DEPENDENT SUBQUERY | connections | ref            | PRIMARY,cid1,cid2 | PRIMARY | 14      | const,func,const | 1    | Using where                                  |
+----+--------------------+-------------+----------------+-------------------+---------+---------+------------------+------+----------------------------------------------+

答案 2 :(得分:0)

我无论如何都不是MySQL大师,但在优化高性能应用程序方面不止一次涉及,尽管我更多地关注优化过程的实现结束而不是寻找需要优化的内容。

我看到的第一件事是子查询看起来很有效,但是使用这个where子句运行第一个查询的方式:...其中a.user_id IN(选择cid2 ...)或a.user_id IN(选择cid1来自...)是我非常谦虚的表演杀手。

我首先尝试优化性能,考虑尝试加入分解,在2次甚至3次查询中拆分您的请求。代码不太漂亮,但数据库将能够更有效地工作。在一个查询中执行所有操作更好是一个神话。

这会给你带来什么?缓存将更有效,如果使用MyISam表,当查询中包含较少的表时,锁定策略更有效,并且您将减少冗余行访问。如果你可以从Using where获得你的主查询(如果你分解那将是最后一个);使用临时;使用filesort,您将获得更快的响应。

使用SHOW SESSION STATUS和FLUSH状态配置您尝试的不同选项,也可以通过在查询中添加SQL_NO_CACHE来禁用缓存以获得您尝试的不同选项的真实比较,即SELSECT SQL_NO_CACHE a.username ...等。

分析和测量结果是您能够确定性能增益的唯一方法。不幸的是,这一步往往被忽视。

祝你好运!