我有一个非常大的网络论坛应用程序(自2001年以来大约有2000万个帖子)从SQL Server 2012数据库运行。数据文件大小约为40GB。
我在相应字段的表格中添加了索引,但是此查询(显示每个论坛中帖子的日期范围)大约需要40分钟才能运行:
SELECT
T2.ForumId,
Forums.Title,
T2.ForumThreads,
T2.ForumPosts,
T2.ForumStart,
T2.ForumStop
FROM
Forums
INNER JOIN (
SELECT
Min(ThreadStart) As ForumStart,
Max(ThreadStop) As ForumStop,
Count(*) As ForumThreads,
Sum(ThreadPosts) As ForumPosts,
Threads.ForumId
FROM
Threads
INNER JOIN (
SELECT
Min(Posts.DateTime) As ThreadStart,
Max(Posts.DateTime) As ThreadStop,
Count(*) As ThreadPosts,
Posts.ThreadId
FROM
Posts
GROUP BY
Posts.ThreadId
) As P2 ON Threads.ThreadId = P2.ThreadId
GROUP BY
Threads.ForumId
) AS T2 ON T2.ForumId = Forums.ForumId
我怎样才能加快速度?
更新:
这是从右到左的预估执行计划:
[Path 1]
Clustered Index Scan (Clustered) [Posts].[PK_Posts], Cost: 98%
Hash Match (Partial Aggregate), Cost: 2%
Parallelism (Repartition Streams), Cost: 0%
Hash Match (Aggregate), Cost 0%
Compute Scalar, Cost: 0%
Bitmap (Bitmap Create), Cost: 0%
[Path 2]
Index Scan (NonClustered) [Threads].[IX_ForumId], Cost: 0%
Parallelism (Repartition Streams), Cost: 0%
[Path 1 and 2 converge into Path 3]
Hash Match (Inner Join), Cost: 0%
Hash Match (Partial Agregate), Cost: 0%
Parallelism (Repartition Streams), Cost: 0%
Sort, Cost: 0%
Stream Aggregate (Aggregate), Cost: 0%
Compute Scalar, Cost: 0%
[Path 4]
Clustered Index Seek (Clustered) [Forums].[PK_Forums], Cost: 0%
[Path 3 and 4 converge into Path 5]
Nested Loops (Inner Join), Cost: 0%
Paralleism (Gather Streams), Cost: 0%
SELECT, Cost: 0%
答案 0 :(得分:1)
您是否尝试将这2个派生表放在#temp表中? SQL Server将从它们获取统计信息(单列),您也可以在它们上面建立索引。
此外,乍一看,索引视图可能有所帮助,因为您有很多聚合。
答案 1 :(得分:1)
这样的事情怎么样?无论如何,你明白了......
SELECT f.ForumID,
f.Title,
MIN(p.[DateTime]) as ForumStart,
MAX(p.[DateTime]) as ForumStop,
COUNT(DISTINCT f.ForumID) as ForumPosts,
COUNT(DISTINCT t.ThreadID) as ForumThreads
FROM Forums f
INNER JOIN Threads t
ON f.ForumID = t.ForumID
INNER JOIN Posts p
ON p.ThreadID = p.ThreadID
GROUP BY f.ForumID, f.Title
答案 2 :(得分:1)
当您执行SELECT FROM
时索引可能有效,但子查询的结果未编入索引。加入他们可能会扼杀他们的表现。
正如Buckley建议的那样,我会尝试将中间结果存储在#temp表中,并在进行最终查询之前添加索引。
但外部SELECT
不包含特定于线程的信息。看起来查询只是按论坛选择最小/最大日期。如果是这样,您可以获得按论坛分组的最小/最大/计数帖子。
答案 3 :(得分:0)
你真的需要聚合两次吗?这个查询会给你相同的结果吗?
SELECT
T2.ForumId,
Forums.Title,
T2.ForumThreads,
T2.ForumPosts,
T2.ForumStart,
T2.ForumStop
FROM
Forums
INNER JOIN (
SELECT
Min(ThreadStart) As ForumStart,
Max(ThreadStop) As ForumStop,
Count(*) As ForumThreads,
Sum(ThreadPosts) As ForumPosts,
Threads.ForumId
FROM
Threads
INNER JOIN (
SELECT
Posts.DateTime As ThreadStart,
Posts.DateTime As ThreadStop,
Count(*) As ThreadPosts,
Posts.ThreadId
FROM
Posts
) As P2 ON Threads.ThreadId = P2.ThreadId
GROUP BY
Threads.ForumId
) AS T2 ON T2.ForumId = Forums.ForumId
答案 4 :(得分:0)
如果通过将ForumId添加到Posts表来进行非规范化,则可以直接从Posts表中查询所有统计信息。使用正确的索引,这可能会表现得相当不错。当然,这需要对代码进行少量更改,以便在插入Posts表时包含ForumId ...
答案 5 :(得分:0)
我在数据库中添加了一些索引,并且大大加快了速度。执行时间现在约为20秒(!!)。我承认很多添加的索引都是猜测(或者只是随机添加)。