使用sean lahman mlb数据库了解mysql查询性能

时间:2014-02-22 19:53:51

标签: mysql sql

我最近下载了sean lahman sql并将数据导入到mysql数据库中,并开始玩一些查询。我的SQL知识非常渺茫;基本的内连接和简单的子查询,从未真正超越过。但这是一个非常酷的数据集,我立即开始遇到一些我并不理解的性能问题。

以下查询通过加入击球和经理表来返回玩家ID,一些进攻统计数据和前5名西雅图水手队HR击球手的经理ID:

select 
    b.playerID, b.yearID, b.H, b.HR, b.RBI, (b.H / b.AB) b_avg, mgr.managerID 
from 
    Batting b 
inner join 
    Managers mgr on b.yearID = mgr.yearID and b.teamID = mgr.teamID 
where   b.teamID = 'SEA' 
order by b.HR desc 
limit 5
    -> ;
+-----------+--------+------+------+------+--------+------------+
| playerID  | yearID | H    | HR   | RBI  | b_avg  | managerID  |
+-----------+--------+------+------+------+--------+------------+
| griffke02 |   1997 |  185 |   56 |  147 | 0.3043 | pinielo01m |
| griffke02 |   1998 |  180 |   56 |  146 | 0.2844 | pinielo01m |
| griffke02 |   1996 |  165 |   49 |  140 | 0.3028 | pinielo01m |
| griffke02 |   1999 |  173 |   48 |  134 | 0.2855 | pinielo01m |
| griffke02 |   1993 |  180 |   45 |  109 | 0.3093 | pinielo01m |
+-----------+--------+------+------+------+--------+------------+
5 rows in set (0.11 sec)

很快就恢复了(0.11秒)。但是当我试图让玩家和经理全名时,查询的速度急剧下降:

select 
    mp.nameLast plyr_first, mp.nameFirst plyr_last, b.yearID, b.H, b.HR, b.RBI, (b.H     / b.AB) b_avg, mm.nameLast mgr_last, mm.nameFirst mgr_lfirst 
from 
    Batting b 
inner join 
    Managers mgr 
on b.yearID = mgr.yearID and b.teamID = mgr.teamID 
inner join 
    Master mp 
    on b.playerID = mp.playerID 
inner join 
    Master mm on mgr.managerID = mm.managerID 
where 
    b.teamID = 'SEA' 
order by 
    b.HR desc limit 5;
+------------+-----------+--------+------+------+------+--------+----------+------------    +
| plyr_first | plyr_last | yearID | H    | HR   | RBI  | b_avg  | mgr_last | mgr_lfirst |
+------------+-----------+--------+------+------+------+--------+----------+------------    +
| Griffey    | Ken       |   1997 |  185 |   56 |  147 | 0.3043 | Piniella | Lou        |
| Griffey    | Ken       |   1998 |  180 |   56 |  146 | 0.2844 | Piniella | Lou            |
| Griffey    | Ken       |   1996 |  165 |   49 |  140 | 0.3028 | Piniella | Lou        |
| Griffey    | Ken       |   1999 |  173 |   48 |  134 | 0.2855 | Piniella | Lou        |
| Griffey    | Ken       |   1993 |  180 |   45 |  109 | 0.3093 | Piniella | Lou        |
+------------+-----------+--------+------+------+------+--------+----------+------------    +
5 rows in set (11.43 sec)

这里是主表上的相关行(排除了批次,但这些是主要的)

+--------------+--------------+------+-----+---------+-------+
| Field        | Type         | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| lahmanID     | int(11)      | NO   | PRI | NULL    |       |
| playerID     | varchar(10)  | YES  |     | NULL    |       |
| managerID    | varchar(10)  | YES  |     | NULL    |       |
| nameFirst    | varchar(50)  | YES  |     | NULL    |       |
| nameLast     | varchar(50)  | YES  |     | NULL    |       |

我基本上从Batting表开始,因为这是数据的位置。然后我在Managers表上添加了相同的结果。然后我加入了Master表并获得了玩家的名字和姓氏,这也不错,但这是Master表的第二次加入,它给了我这个问题。

当我将查询修改为仅返回经理ID时,而不是经理的名字和姓氏时,它快得多,大约四分之一秒。关于如何获得具有良好性能的玩家和经理的名字/姓氏的任何想法,你能否指出我如何减慢查询的正确方向?

感谢, bp的

1 个答案:

答案 0 :(得分:0)

这可能不正确,但您可以尝试将管理员姓名的连接更改为:

inner join Master mm
  on mgr.managerID = mm.playerID

所以你要跑:

select mp.nameLast plyr_first,
       mp.nameFirst plyr_last,
       b.yearID,
       b.H,
       b.HR,
       b.RBI,
       (b.H / b.AB) b_avg,
       mm.nameLast mgr_last,
       mm.nameFirst mgr_lfirst
  from Batting b
 inner join Managers mgr
    on b.yearID = mgr.yearID
   and b.teamID = mgr.teamID
 inner join Master mp
    on b.playerID = mp.playerID
 inner join Master mm
    on mgr.managerID = mm.playerID
 where b.teamID = 'SEA'
 order by b.HR desc limit 5;

我只是想排除一个糟糕的联接作为原因。如果它不起作用,你可以从查询中取出“限制5”并查看是否有任何行重复和/或“坏”?