我有一张喜欢/不喜欢的表,其中包含大约500万行。当我使用以下查询来获取数据时,它会在2分钟内完成。有没有更好的方法来存储和检索喜欢/不喜欢?每当有人喜欢/不喜欢帖子时,它都会向数据库添加一列。 0对于不喜欢和1对于喜欢。然后,我需要为每个用户提供两列的总和,然后返回具有最多喜欢与不喜欢的用户。如果我取出喜欢/不喜欢的SUM,查询将在4秒后返回。我还有UserID和我正在分组的所有内容的索引。这是查询:
SELECT TOP 50
Flows_Users.UserName,
Flows_Users.UserID,Flows_Users.ImageName,
Flows_Users.DisplayName,
Flows_UserBios.bio,
FlowsCount = (SELECT Count(1) FROM Flows_Flows
WHERE UserID = Flows_Users.UserID AND Flows_Flows.Active = '1'),
BeatsCount = (SELECT Count(1) FROM Flows_Beats
WHERE UserName_ID = Flows_Users.UserID AND Flows_Beats.Active = '1'),
FollowersCount = (SELECT Count(1) FROM Flows_Follow
WHERE FOLLOWING = Flows_Users.UserID),
FollowingCount = (SELECT Count(1) FROM Flows_Follow
WHERE FOLLOWER = Flows_Users.UserID),
ISNULL(SUM(Flows_Flows_Likes_Dislikes.[Like]) , 0) AS Likes,
ISNULL(SUM(Flows_Flows_Likes_Dislikes.Dislike), 0) AS DisLikes
FROM
Flows_Users
INNER JOIN
Flows_Flows ON Flows_Users.UserID = Flows_Flows.UserID
INNER JOIN
Flows_UserBios ON Flows_Users.UserID = Flows_UserBios.userid
INNER JOIN
Flows_Flows_Likes_Dislikes ON Flows_Flows.FlowID = Flows_Flows_Likes_Dislikes.FlowID
WHERE
Flows_Users.UserID = Flows_Users.UserID
GROUP BY
Flows_Users.UserID,
Flows_Users.UserName,
Flows_Users.ImagePath,
Flows_Users.ImageName,
Flows_Users.DisplayName,
Flows_UserBios.bio
ORDER BY
[Likes] DESC, [Dislikes] ASC, FlowsCount DESC
答案 0 :(得分:3)
加入500万行表并不是一个好主意。如果你看一下执行计划,我打赌你会发现Flows和Likes_Dislikes之间的联接是hashjoin - 这是最糟糕的情况。
优化此查询的第一步是检测哪个特定连接增加了执行时间。据推测,该查询的一部分在可接受的时间(例如1-2秒)内执行。其他一切都是问题所在。并且通过非规范化表来解决问题。而不是加入喜欢/不喜欢的表,添加喜欢/不喜欢得分列到流表。无论何时插入喜欢/不喜欢,请立即更新Flow记录。通过这样做,您不需要在此查询中进行大量连接。
请记住,规范化是一个定义明确的理论,但这种做法往往违背它。在规范化表格和冗余之间取得适当的平衡,就可以成为出色的软件。