我最近下载了sean lahman sql并将数据导入到mysql数据库中,并开始玩一些查询。我的SQL知识非常渺茫;基本的内连接和简单的子查询,从未真正超越过。但这是一个非常酷的数据集,我立即开始遇到一些我并不理解的性能问题。
以下查询通过加入击球和经理表来返回玩家ID,一些进攻统计数据和前5名西雅图水手队HR击球手的经理ID:
select
b.playerID, b.yearID, b.H, b.HR, b.RBI, (b.H / b.AB) b_avg, mgr.managerID
from
Batting b
inner join
Managers mgr on b.yearID = mgr.yearID and b.teamID = mgr.teamID
where b.teamID = 'SEA'
order by b.HR desc
limit 5
-> ;
+-----------+--------+------+------+------+--------+------------+
| playerID | yearID | H | HR | RBI | b_avg | managerID |
+-----------+--------+------+------+------+--------+------------+
| griffke02 | 1997 | 185 | 56 | 147 | 0.3043 | pinielo01m |
| griffke02 | 1998 | 180 | 56 | 146 | 0.2844 | pinielo01m |
| griffke02 | 1996 | 165 | 49 | 140 | 0.3028 | pinielo01m |
| griffke02 | 1999 | 173 | 48 | 134 | 0.2855 | pinielo01m |
| griffke02 | 1993 | 180 | 45 | 109 | 0.3093 | pinielo01m |
+-----------+--------+------+------+------+--------+------------+
5 rows in set (0.11 sec)
很快就恢复了(0.11秒)。但是当我试图让玩家和经理全名时,查询的速度急剧下降:
select
mp.nameLast plyr_first, mp.nameFirst plyr_last, b.yearID, b.H, b.HR, b.RBI, (b.H / b.AB) b_avg, mm.nameLast mgr_last, mm.nameFirst mgr_lfirst
from
Batting b
inner join
Managers mgr
on b.yearID = mgr.yearID and b.teamID = mgr.teamID
inner join
Master mp
on b.playerID = mp.playerID
inner join
Master mm on mgr.managerID = mm.managerID
where
b.teamID = 'SEA'
order by
b.HR desc limit 5;
+------------+-----------+--------+------+------+------+--------+----------+------------ +
| plyr_first | plyr_last | yearID | H | HR | RBI | b_avg | mgr_last | mgr_lfirst |
+------------+-----------+--------+------+------+------+--------+----------+------------ +
| Griffey | Ken | 1997 | 185 | 56 | 147 | 0.3043 | Piniella | Lou |
| Griffey | Ken | 1998 | 180 | 56 | 146 | 0.2844 | Piniella | Lou |
| Griffey | Ken | 1996 | 165 | 49 | 140 | 0.3028 | Piniella | Lou |
| Griffey | Ken | 1999 | 173 | 48 | 134 | 0.2855 | Piniella | Lou |
| Griffey | Ken | 1993 | 180 | 45 | 109 | 0.3093 | Piniella | Lou |
+------------+-----------+--------+------+------+------+--------+----------+------------ +
5 rows in set (11.43 sec)
这里是主表上的相关行(排除了批次,但这些是主要的)
+--------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| lahmanID | int(11) | NO | PRI | NULL | |
| playerID | varchar(10) | YES | | NULL | |
| managerID | varchar(10) | YES | | NULL | |
| nameFirst | varchar(50) | YES | | NULL | |
| nameLast | varchar(50) | YES | | NULL | |
我基本上从Batting表开始,因为这是数据的位置。然后我在Managers表上添加了相同的结果。然后我加入了Master表并获得了玩家的名字和姓氏,这也不错,但这是Master表的第二次加入,它给了我这个问题。
当我将查询修改为仅返回经理ID时,而不是经理的名字和姓氏时,它快得多,大约四分之一秒。关于如何获得具有良好性能的玩家和经理的名字/姓氏的任何想法,你能否指出我如何减慢查询的正确方向?
感谢, bp的
答案 0 :(得分:0)
这可能不正确,但您可以尝试将管理员姓名的连接更改为:
inner join Master mm
on mgr.managerID = mm.playerID
所以你要跑:
select mp.nameLast plyr_first,
mp.nameFirst plyr_last,
b.yearID,
b.H,
b.HR,
b.RBI,
(b.H / b.AB) b_avg,
mm.nameLast mgr_last,
mm.nameFirst mgr_lfirst
from Batting b
inner join Managers mgr
on b.yearID = mgr.yearID
and b.teamID = mgr.teamID
inner join Master mp
on b.playerID = mp.playerID
inner join Master mm
on mgr.managerID = mm.playerID
where b.teamID = 'SEA'
order by b.HR desc limit 5;
我只是想排除一个糟糕的联接作为原因。如果它不起作用,你可以从查询中取出“限制5”并查看是否有任何行重复和/或“坏”?