加入和分组多个表

时间:2017-04-22 16:45:02

标签: sql sql-server join sql-server-2016

假设我在MSSQL 2016(后来的Azure SQL)上有这种关系

[ForumBoards] 1-n [ForumThreads] 1-n [ForumPosts] n-1 [Users]

我们拥有:50个主板,200k线程,100万个帖子和50k用户

现在的目标是

  • 所有ID和姓名
  • 的主板
  • 电路板内的线程数
  • 董事会内的帖子数量
  • 董事会内的最新帖子
  • 用户ID和最新帖子的名称

我的第一个节目是

SELECT 
    Boards.Id AS BoardsId,
    Board.Name AS BoardsName,
    LP.*
        ThreadsCount = (SELECT Count(*) FROM ForumBoards AS SubB 
                        JOIN ForumThreads AS SubT ON SubB.Id = SubT.BoardId 
                        WHERE SubB.Id = Boards.Id AND SubT.BoardId = SubB.Id),  
        PostsCount = (SELECT Count(*) FROM ForumBoards AS SubB 
                        JOIN ForumThreads AS SubT ON SubB.Id = SubT.BoardId 
                        JOIN ForumPosts AS SubP ON SubT.Id = SubP.ThreadId 
                      WHERE SubB.Id = Boards.Id AND SubT.BoardId = SubB.Id AND SubP.ThreadId = SubT.Id)
FROM ForumBoards as Boards 

OUTER APPLY(
    SELECT 
    TOP 1   SubP.Id AS LatestPostId,
        SubP.PostedOn AS LatestPostPostedOn,
        SubP.ThreadId AS LatestPostThreadId,
        SubT.Topic AS LatestPostThreadTopic,
        SubU.Id AS LatestPostUserId,
        SubU.Username AS LatestPostUsername 
        FROM ForumBoards AS SubB 
             JOIN ForumThreads AS SubT ON SubB.Id = SubT.BoardId 
             JOIN ForumPosts AS SubP ON SubT.Id = SubP.ThreadId 
             JOIN Users AS SubU ON SubP.UserId = SubU.Id 
        WHERE SubB.Id = Boards.Id AND SubT.BoardId = SubB.Id AND SubP.ThreadId = SubT.Id AND SubU.Id = SubP.UserId 
        ORDER BY SubP.PostedOn DESC) AS LP

具有令人难以置信的糟糕表现。

没有

WHERE SubB.Id = Boards.Id AND SubT.BoardId = SubB.Id AND SubP.ThreadId = SubT.Id AND SubU.Id = SubP.UserId 

它花费45毫秒,大约6秒钟。

另一个节目是这个

SELECT  
    B.Id,
    B.Name as BoardName,
    Count(*) as ThreadsCount,
    (SELECT Count(*)
    FROM
        ForumBoards Boards, ForumThreads Threads, ForumPosts Posts
        WHERE Boards.Id = Threads.BoardId AND Threads.Id = Posts.ThreadId AND Boards.Id = B.Id) AS PostsCount

FROM    ForumBoards B, ForumThreads T

WHERE B.Id = T.BoardId

GROUP   BY B.Id, B.Name

哪个好,大约172ms - 但没有最新的帖子。

但我认为我处于对冲的错误一边。并提示我如何达到目标?

1 个答案:

答案 0 :(得分:2)

好的,这似乎是一个类似于论坛的项目,所以首先要记住的是:在阅读数据库时,你会有更多的阅读。

避免每当有人显示您的首页时应该运行的复杂查询,这不是一个好主意,您只会以一个低级数据库结束。

这种问题是触发器可以通过奇迹发生的。

board 上添加5个新列:

  • nb_threads
  • nb_posts
  • last_post_id
  • last_user_id
  • last_user_name

并添加以下触发器:

  • ForumThreads.trgAddThread => +1给父母ForumBoards.nb_threads
  • ForumThreads.trgDeleteThread => -1到父ForumBoards.nb_threads
  • ForumPosts.trgAddPost => +1到父ForumBoards.nb_posts,将当前post.id设置为last_post_id,将user.id设置为last_user_id并获取user.name以设置last_user_name
  • ForumPosts.trgDeletePost => -1到父ForumBoards.nb_posts并获取最后一篇帖子以更新以前的数据

如果你不能使用触发器(正如你在评论中解释的那样),这个查询应该在400ms下完成:

SELECT 
    Boards.Id AS BoardsId,
    Boards.Name AS BoardsName,
    coalesce(nbThread.ThreadsCount, 0) ThreadsCount, 
    coalesce(LP.nbPost, 0) nbPost,
    LP.*,
FROM ForumBoards AS Boards
LEFT JOIN (
    SELECT BoardId, Count(*) ThreadsCount
    FROM ForumThreads
    GROUP BY ForumThreads.BoardId
) AS nbThread
    ON nbThread.BoardId = Boards.Id
LEFT JOIN (
    SELECT   
        t.BoardId,
        t.nbPost,
        ForumPosts.Id AS LatestPostId,
        ForumPosts.PostedOn AS LatestPostPostedOn,
        ForumPosts.ThreadId AS LatestPostThreadId,
        ForumThreads.Topic AS LatestPostThreadTopic,
        Users.Id AS LatestPostUserId,
        Users.Username AS LatestPostUsername 
        FROM (
            select 
                ForumThreads.BoardId, 
                MAX(ForumPosts.Id) Id, 
                Count(*) nbPost
            from ForumPosts
            JOIN ForumThreads
                ON ForumThreads.Id = ForumPosts.ThreadId
            GROUP BY ForumThreads.BoardId
        ) AS t 
        INNER JOIN ForumPosts  
            ON t.Id = ForumPosts.Id 
        INNER JOIN ForumThreads
            ON ForumThreads.Id = ForumPosts.ThreadId
        INNER JOIN Users
            ON Users.Id = ForumPosts.UserId        
) AS LP
    ON LP.BoardId = Boards.Id