我有一个看起来像这样的排行榜:
|--------------------------------------|
| userId | allTimePoints | allTimeRank |
|--------------------------------------|
| .. | ... | ... |
| xx | 5555555 | ? |
| .. | ... | ... |
----------------------------------------
假设该表有一百万条记录,allTimePoints
不断更新。当用户要求查看排行榜时,我希望能够向他们展示他们的排名,得分以及最接近的竞争对手。我想实现以下目标:
我已经开始这样了,当桌子有1码的行时,我的机器需要大约0.4秒。
SET @rowIndex := 0;
SET @rank := 0;
SET @prev := NULL;
SET @userIdPosition := 0;
SELECT
@rowIndex := @rowIndex+1 AS rowIndex,
userId,
@rank := IF(@prev=allTimePoints, @rank, @rank+1) AS rank,
@prev := allTimePoints AS allTimePoints,
@userIdPosition := IF(userId=1860, @rowIndex, @userIdPosition) AS requestedOffset
FROM Leaderboard
ORDER BY allTimePoints DESC;
顺便说一下,这个方法相对于使用自联接的运行时优势在这里描述(它快得多):http://code.openark.org/blog/mysql/sql-ranking-without-self-join
我将rowIndex
和rank
保留为单独的变量,这样如果存在排名关系,我可以更准确地计算请求用户的分页偏移量(即 n 用户得分相同。)
到目前为止一切都那么好,虽然我担心如果这不会减少到msec运行时间,当数十万用户同时运行查询时它就不可行。
更糟糕的是,如果我将此查询扩展为如上所述正确使用分页,则运行时间增加到1.5秒
SET @rowIndex := 0;
SET @rank := 0;
SET @prev := NULL;
SET @userIdPosition := 0;
SELECT sortedL.userId, sortedL.rank, sortedL.allTimePoints
FROM
(SELECT
@rowIndex := @rowIndex+1 AS rowIndex,
userId,
@rank := IF(@prev=allTimePoints, @rank, @rank+1) AS rank,
@prev := allTimePoints AS allTimePoints,
@userIdPosition := IF(userId=1860, @rowIndex, @userIdPosition) AS requestedOffset
FROM Leaderboard
ORDER BY allTimePoints DESC) AS sortedL
-- simulate paging, as LIMIT doesn't seem to accept variables
WHERE sortedL.rowIndex > sortedL.requestedOffset -15 AND sortedL.rowIndex < sortedL.requestedOffset + 15;
根据需要返回29个用户,请求用户位于中间。
如果我用EXPLAIN
运行它,我可以看到子查询正在使用FILESORT,但结果没有编入索引,因此外部SELECT被强制使用WHERE对结果集进行另一次完整扫描(慢于FILESORT)。
问题(1):我该如何优化?
另一个想法是将排名存储在索引列中:allTimeRank
。我以为我会尝试按计划(例如,每10分钟)在一个过程中对表进行排序,然后使用更简单的SELECT来提供非常快速的访问,这将使用索引。我还没有设法让它正常工作,它似乎没有在我的WHERE子句中使用条件(存储在allTimeRank
中的排名不正确,MySQL抱怨所以我必须关闭安全更新,让它甚至运行)
SET SQL_SAFE_UPDATES=0;
SET @rowIndex := 0;
SET @rank := 0;
SET @prev := NULL;
SET @userIdPosition := 0;
UPDATE Leaderboard L,
(SELECT
@rowIndex := @rowIndex+1 AS rowIndex,
userId,
@rank := IF(@prev=allTimePoints, @rank, @rank+1) AS rank,
@prev := allTimePoints AS allTimePoints,
@userIdPosition := IF(userId=1860, @rowIndex, @userIdPosition) AS requestedOffset
FROM Leaderboard
ORDER BY allTimePoints DESC) AS sortedL
SET L.allTimeRank = sortedL.rank
WHERE sortedL.userId = L.userId;
SET SQL_SAFE_UPDATES=1;
问题(2):如何使WHERE条件有效。
这需要2分钟到12秒才能运行。不确定为什么不一致。在任何情况下,这将阻止获胜点的用户的UPDATE,让人感觉应用程序已挂起。 问题(3):有解决方法吗?
答案 0 :(得分:0)
首先,你没有正确计算排名。如果有三名球员:布兰妮(100分),雷切尔(100分)和苏珊(75分),那么布兰妮和雷切尔的排名都是1,而苏珊的排名应该是3分。你的例行程序会给苏珊等级为2.
其次,当玩家具有相同的分数(和等级)时,他们应该以一致的顺序显示。绑定分数/等级中的顺序应该是她获得该分数的顺序。
我会在表格中添加两列:allTimeRank
和allTimeRankOrder
。每次积分变化时都会实时更新。要意识到如果我的分数从100增加到125,那么唯一需要重新排名的用户是那些得分从100到124的用户 - 只是我跳过的人。
这是一个例行程序。它假设积分总是上升,永不下降。我没有一百万行表来测试,但是如果你设置了正确的索引,我希望它能够快速运行。
CREATE PROCEDURE `updateUserPoints`(IN `puserid` VARCHAR(10), IN `pnewPoints` INT)
BEGIN
SET @currPoints = 0;
SET @currRank = 0;
SET @currRankOrder = 0;
SELECT allTimePoints, allTimeRank, allTimeRankOrder INTO @currPoints, @currRank, @currRankOrder from Leaderboard where userid = puserid;
SET @newRank = 0;
SET @newRankOrder = 0;
SELECT max(allTimeRank), max(allTimeRankOrder)+1 INTO @newRank, @newRankOrder FROM Leaderboard WHERE allTimePoints = pnewPoints;
IF (@newRank IS NULL) THEN
SET @newRank = (SELECT min(allTimeRank) from Leaderboard WHERE allTimePoints < pnewPoints);
SET @newRankOrder = 0;
END IF;
UPDATE Leaderboard
SET allTimePoints = pnewPoints,
allTimeRank = @newRank,
allTimeRankOrder = @newRankOrder
WHERE userid = puserid;
/* all the people that I was tied with, but ahead in order,
slide up one in the order */
UPDATE Leaderboard
SET allTimeRankOrder = allTimeRankOrder - 1
WHERE allTimeRank = @currRank
AND allTimeRankOrder > @currRankOrder;
/* did I jump anyone? Their rank goes down. */
UPDATE Leaderboard
SET allTimeRank = allTimeRank + 1
WHERE userid <> puserid
AND allTimePoints >= @currPoints
AND allTimePoints < pnewPoints;
END