Question

我的表格包含以下列：

gamelogs_id (auto_increment primary key)
player_id (int)
player_name (varchar)
game_id (int)
season_id (int)
points (int)

该表具有以下索引

+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table           | Non_unique | Key_name           | Seq_in_index | Column_name        | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| player_gamelogs |          0 | PRIMARY            |            1 | player_gamelogs_id | A         |      371330 |     NULL | NULL   |      | BTREE      |         |               |
| player_gamelogs |          1 | player_name        |            1 | player_name        | A         |        3375 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | points          |            1 | points          | A         |         506 |     NULL | NULL   | YES  | BTREE      |         ## Heading ##|               |
| player_gamelogs |          1 | game_id            |            1 | game_id            | A         |       37133 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | season             |            1 | season             | A         |          30 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | team_abbreviation  |            1 | team_abbreviation  | A         |          70 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            1 | game_id            | A         |       41258 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            2 | player_id          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            3 | dk_points          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            1 | game_id            | A         |       41258 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            2 | player_id          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            3 | season_id          | A         |      371330 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

我正在尝试计算比赛开始前赛季和球员的积分平均值。因此，对于本赛季的第3场比赛，avg_points将是游戏1和2的平均值。游戏数量按顺序排列，使得较早的游戏比较晚的游戏少。我也可以选择使用日期字段，但我认为数字比较会更快？

我的查询如下：

SELECT game_id, 
       player_id, 
       player_name, 
       (SELECT avg(points) 
          FROM player_gamelogs t2
         WHERE t2.game_id < t1.game_id 
           AND t1.player_id = t2.player_id 
           AND t1.season_id = t2.season_id) AS avg_points
  FROM player_gamelogs t1
 ORDER BY player_name, game_id;

EXPLAIN产生以下输出：

| id | select_type        | table | type | possible_keys                        | key  | key_len | ref  | rows   | Extra                                           |
+----+--------------------+-------+------+--------------------------------------+------+---------+------+--------+-------------------------------------------------+
|  1 | PRIMARY            | t1    | ALL  | NULL                                 | NULL | NULL    | NULL | 371330 | Using filesort                                  |
|  2 | DEPENDENT SUBQUERY | t2    | ALL  | game_id,player_id,game_player_season | NULL | NULL    | NULL | 371330 | Range checked for each record (index map: 0xC8) |

我不确定这是因为涉及的任务的性质还是因为我的查询效率低下。谢谢你的任何建议！

Answer 1

请考虑此查询：

SELECT t1.season_id, t1.game_id, t1.player_id, t1.player_name, AVG(COALESCE(t2.points, 0)) AS average_player_points
FROM player_gamelogs t1
        LEFT JOIN player_gamelogs t2 ON 
                t1.game_id > t2.game_id 
            AND t1.player_id = t2.player_id
            AND t1.season_id = t2.season_id 
GROUP BY
    t1.season_id, t1.game_id, t1.player_id, t1.player_name
ORDER BY t1.player_name, t1.game_id;

注意：

为了达到最佳效果，你需要一个额外的索引（season_id，game_id，player_id，player_name）
更好的方法是让播放器表从id中检索名称。对我来说似乎有点多余，我们必须从日志表中获取播放器名称，而且如果它在索引中是必需的。
Group by已按分组列排序。如果可以，请避免事后订购，因为它会产生无用的开销。 如评论中所述，这不是一种官方行为，假设其随时间的一致性的结果应该考虑与突然失去分类的风险。

Answer 2

你的查询没问题如下：

SELECT game_id, player_id, player_name, 
       (SELECT avg(t2.points) 
        FROM player_gamelogs t2
        WHERE t2.game_id < t1.game_id AND
              t1.player_id = t2.player_id AND
              t1.season_id = t2.season_id
      ) AS avg_points
FROM player_gamelogs t1
ORDER BY player_name, game_id;

但是，为获得最佳性能，您需要两个复合索引：(player_id, season_id, game_id, points)和(player_name, game_id, season_id)。

第一个索引应该加速子查询。第二个是外部order by。

Answer 3

现在您正在查询，您正在为每个玩家运行每个游戏及其下的所有游戏...例如，如果您每人有10个游戏，则每个季节获得以下结果/人

Game 10, Game 10 points, avg of games 1-9
Game 9, Game 9 points, avg of games 1-8...
...
...
Game 2, Game 2 points, avg of thus final game 1 only.

你声明你想要最新的游戏，其中包含所有内容的平均值。也就是说，我假设你并不关心每人每个较低的游戏等级。

您也在进行涵盖所有季节的查询。如果一个季节结束，你关心旧季节吗？或者只是当前的季节。否则你将经历所有赛季，所有球员......

所有这一切，我提供以下内容。首先，使用WHERE子句将查询限制为最新季节，但我特意将季节留在查询/组中，以防您想要其他季节。然后，我将给定人/季的MAXIMUM游戏作为最后1行（每人季节）的基线，然后得到其下的所有内容的平均值。所以，在10场比赛的场景样本中，我不会抓住9-2的基础行，只是根据我的场景返回＃10游戏。

select
      pgMax.Player_ID,
      pgMax.Season_ID,
      pgMax.mostRecentGameID,
      pgl3.points as mostRecentGamePoints,
      pgl3.player_name,
      coalesce( avg( pgl2.points ), 0 ) as AvgPointsPriorToCurrentGame
   from
      ( select pgl1.player_id,
               pgl1.season_id,
               max( pgl1.game_id ) as mostRecentGameID
           from
              player_gameLogs pgl1
           where
               pgl1.season_id = JustOneSeason
           group by
              pgl1.player_id,
              pgl1.season_id ) pgMax

         JOIN player_gamelogs pgl pgl2
            on pgMax.player_id = pgl2.player_id
           AND pgMax.season_id = pgl2.season_id
           AND pgMax.mostRecentGameID > pgl2.game_id

         JOIN player_gamelogs pgl pgl3
            on pgMax.player_id = pgl3.player_id
           AND pgMax.season_id = pgl3.season_id
           AND pgMax.mostRecentGameID = pgl3.game_id
   group by
      pgMax.Player_ID,
      pgMax.Season_ID
   order by
      pgMax.Player_ID

现在，为了优化查询，最好使用复合索引（player_id，season_id，game_id，points）。但是，如果你只是在寻找“当前季节”，那么你的索引（season_id，player_id，game_id，points）将SEASON ID放在第一位置以预先认证WHERE子句。

MySQL查询速度很慢

3 个答案: