我的表格包含以下列:
gamelogs_id (auto_increment primary key)
player_id (int)
player_name (varchar)
game_id (int)
season_id (int)
points (int)
该表具有以下索引
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| player_gamelogs | 0 | PRIMARY | 1 | player_gamelogs_id | A | 371330 | NULL | NULL | | BTREE | | |
| player_gamelogs | 1 | player_name | 1 | player_name | A | 3375 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | points | 1 | points | A | 506 | NULL | NULL | YES | BTREE | ## Heading ##| |
| player_gamelogs | 1 | game_id | 1 | game_id | A | 37133 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | season | 1 | season | A | 30 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | team_abbreviation | 1 | team_abbreviation | A | 70 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | player_id | 1 | game_id | A | 41258 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | player_id | 2 | player_id | A | 371330 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | player_id | 3 | dk_points | A | 371330 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | game_player_season | 1 | game_id | A | 41258 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | game_player_season | 2 | player_id | A | 371330 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | game_player_season | 3 | season_id | A | 371330 | NULL | NULL | | BTREE | | |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
我正在尝试计算比赛开始前赛季和球员的积分平均值。因此,对于本赛季的第3场比赛,avg_points将是游戏1和2的平均值。游戏数量按顺序排列,使得较早的游戏比较晚的游戏少。我也可以选择使用日期字段,但我认为数字比较会更快?
我的查询如下:
SELECT game_id,
player_id,
player_name,
(SELECT avg(points)
FROM player_gamelogs t2
WHERE t2.game_id < t1.game_id
AND t1.player_id = t2.player_id
AND t1.season_id = t2.season_id) AS avg_points
FROM player_gamelogs t1
ORDER BY player_name, game_id;
EXPLAIN产生以下输出:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+--------------------------------------+------+---------+------+--------+-------------------------------------------------+
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 371330 | Using filesort |
| 2 | DEPENDENT SUBQUERY | t2 | ALL | game_id,player_id,game_player_season | NULL | NULL | NULL | 371330 | Range checked for each record (index map: 0xC8) |
我不确定这是因为涉及的任务的性质还是因为我的查询效率低下。谢谢你的任何建议!
答案 0 :(得分:7)
请考虑此查询:
SELECT t1.season_id, t1.game_id, t1.player_id, t1.player_name, AVG(COALESCE(t2.points, 0)) AS average_player_points
FROM player_gamelogs t1
LEFT JOIN player_gamelogs t2 ON
t1.game_id > t2.game_id
AND t1.player_id = t2.player_id
AND t1.season_id = t2.season_id
GROUP BY
t1.season_id, t1.game_id, t1.player_id, t1.player_name
ORDER BY t1.player_name, t1.game_id;
注意:
Group by
已按分组列排序。如果可以,请避免事后订购,因为它会产生无用的开销。 如评论中所述,这不是一种官方行为,假设其随时间的一致性的结果应该考虑与突然失去分类的风险。 答案 1 :(得分:2)
你的查询没问题如下:
SELECT game_id, player_id, player_name,
(SELECT avg(t2.points)
FROM player_gamelogs t2
WHERE t2.game_id < t1.game_id AND
t1.player_id = t2.player_id AND
t1.season_id = t2.season_id
) AS avg_points
FROM player_gamelogs t1
ORDER BY player_name, game_id;
但是,为获得最佳性能,您需要两个复合索引:(player_id, season_id, game_id, points)
和(player_name, game_id, season_id)
。
第一个索引应该加速子查询。第二个是外部order by
。
答案 2 :(得分:1)
现在您正在查询,您正在为每个玩家运行每个游戏及其下的所有游戏...例如,如果您每人有10个游戏,则每个季节获得以下结果/人
Game 10, Game 10 points, avg of games 1-9
Game 9, Game 9 points, avg of games 1-8...
...
...
Game 2, Game 2 points, avg of thus final game 1 only.
你声明你想要最新的游戏,其中包含所有内容的平均值。也就是说,我假设你并不关心每人每个较低的游戏等级。
您也在进行涵盖所有季节的查询。如果一个季节结束,你关心旧季节吗?或者只是当前的季节。否则你将经历所有赛季,所有球员......
所有这一切,我提供以下内容。首先,使用WHERE子句将查询限制为最新季节,但我特意将季节留在查询/组中,以防您想要其他季节。然后,我将给定人/季的MAXIMUM游戏作为最后1行(每人季节)的基线,然后得到其下的所有内容的平均值。所以,在10场比赛的场景样本中,我不会抓住9-2的基础行,只是根据我的场景返回#10游戏。
select
pgMax.Player_ID,
pgMax.Season_ID,
pgMax.mostRecentGameID,
pgl3.points as mostRecentGamePoints,
pgl3.player_name,
coalesce( avg( pgl2.points ), 0 ) as AvgPointsPriorToCurrentGame
from
( select pgl1.player_id,
pgl1.season_id,
max( pgl1.game_id ) as mostRecentGameID
from
player_gameLogs pgl1
where
pgl1.season_id = JustOneSeason
group by
pgl1.player_id,
pgl1.season_id ) pgMax
JOIN player_gamelogs pgl pgl2
on pgMax.player_id = pgl2.player_id
AND pgMax.season_id = pgl2.season_id
AND pgMax.mostRecentGameID > pgl2.game_id
JOIN player_gamelogs pgl pgl3
on pgMax.player_id = pgl3.player_id
AND pgMax.season_id = pgl3.season_id
AND pgMax.mostRecentGameID = pgl3.game_id
group by
pgMax.Player_ID,
pgMax.Season_ID
order by
pgMax.Player_ID
现在,为了优化查询,最好使用复合索引 (player_id,season_id,game_id,points)。但是,如果你只是在寻找“当前季节”,那么你的索引(season_id,player_id,game_id,points)将SEASON ID放在第一位置以预先认证WHERE子句。