我在数百万行表中遇到了一些严重的性能问题,我觉得我应该可以很快得到结果。这是我所拥有的,我如何查询它以及它需要多长时间:
我正在运行SQL Server 2008 Standard,因此分区目前不是一个选项
我正在尝试汇总过去30天内特定帐户的所有广告资源的所有观看次数。
所有视图都存储在下表中:
CREATE TABLE [dbo].[LogInvSearches_Daily]( [ID] [bigint] IDENTITY(1,1) NOT NULL, [Inv_ID] [int] NOT NULL, [Site_ID] [int] NOT NULL, [LogCount] [int] NOT NULL, [LogDay] [smalldatetime] NOT NULL, CONSTRAINT [PK_LogInvSearches_Daily] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY] ) ON [PRIMARY]
此表有132,000,000条记录,超过4场演出。
表格中的10行样本:
ID Inv_ID Site_ID LogCount LogDay -------------------- ----------- ----------- ----------- ----------------------- 1 486752 48 14 2009-07-21 00:00:00 2 119314 51 16 2009-07-21 00:00:00 3 313678 48 25 2009-07-21 00:00:00 4 298863 0 1 2009-07-21 00:00:00 5 119996 0 2 2009-07-21 00:00:00 6 463777 534 7 2009-07-21 00:00:00 7 339976 503 2 2009-07-21 00:00:00 8 333501 570 4 2009-07-21 00:00:00 9 453955 0 12 2009-07-21 00:00:00 10 443291 0 4 2009-07-21 00:00:00 (10 row(s) affected)
/****** Object: Index [IX_LogInvSearches_Daily_LogDay] Script Date: 05/12/2010 11:08:22 ******/ CREATE NONCLUSTERED INDEX [IX_LogInvSearches_Daily_LogDay] ON [dbo].[LogInvSearches_Daily] ( [LogDay] ASC ) INCLUDE ( [Inv_ID], [LogCount]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
我正在使用以下查询来聚合数据并给我前5条记录。此查询目前需要24秒才能返回5行:
StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SELECT TOP 5 Sum(LogCount) AS Views , DENSE_RANK() OVER(ORDER BY Sum(LogCount) DESC, Inv_ID DESC) AS Rank , Inv_ID FROM LogInvSearches_Daily D (NOLOCK) WHERE LogDay > DateAdd(d, -30, getdate()) AND EXISTS( SELECT NULL FROM propertyControlCenter.dbo.Inventory (NOLOCK) WHERE Acct_ID = 18731 AND Inv_ID = D.Inv_ID ) GROUP BY Inv_ID (1 row(s) affected) StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |--Top(TOP EXPRESSION:((5))) |--Sequence Project(DEFINE:([Expr1007]=dense_rank)) |--Segment |--Segment |--Sort(ORDER BY:([Expr1006] DESC, [D].[Inv_ID] DESC)) |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1006]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount]))) |--Sort(ORDER BY:([D].[Inv_ID] ASC)) |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1011], [Expr1012], [Expr1010])) | |--Compute Scalar(DEFINE:(([Expr1011],[Expr1012],[Expr1010])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6)))) | | |--Constant Scan | |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1011] AND [D].[LogDay] < [Expr1012]) ORDERED FORWARD) |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID]), SEEK:([propertyControlCenter].[dbo].[Inventory].[Acct_ID]=(18731) AND [propertyControlCenter].[dbo].[Inventory].[Inv_ID]=[LOA (13 row(s) affected)
我尝试使用CTE首先获取行并聚合它们,但是运行速度不快,并且基本上给出了相同的执行计划。
(1 row(s) affected) StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --SET SHOWPLAN_TEXT ON; WITH getSearches AS ( SELECT LogCount -- , DENSE_RANK() OVER(ORDER BY Sum(LogCount) DESC, Inv_ID DESC) AS Rank , D.Inv_ID FROM LogInvSearches_Daily D (NOLOCK) INNER JOIN propertyControlCenter.dbo.Inventory I (NOLOCK) ON Acct_ID = 18731 AND I.Inv_ID = D.Inv_ID WHERE LogDay > DateAdd(d, -30, getdate()) -- GROUP BY Inv_ID ) SELECT Sum(LogCount) AS Views, Inv_ID FROM getSearches GROUP BY Inv_ID (1 row(s) affected) StmtText ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1004]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount]))) |--Sort(ORDER BY:([D].[Inv_ID] ASC)) |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1008], [Expr1009], [Expr1007])) | |--Compute Scalar(DEFINE:(([Expr1008],[Expr1009],[Expr1007])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6)))) | | |--Constant Scan | |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1008] AND [D].[LogDay] < [Expr1009]) ORDERED FORWARD) |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID] AS [I]), SEEK:([I].[Acct_ID]=(18731) AND [I].[Inv_ID]=[LOALogs].[dbo].[LogInvSearches_Daily].[Inv_ID] as [D].[Inv_ID]) ORDERED FORWARD) (8 row(s) affected) (1 row(s) affected)
所以考虑到我在执行计划中获得了良好的Index Seeks,我该怎么做才能让它更快地运行?
更新:
这是没有DENSE_RANK()的相同查询运行,它运行完全相同的24秒,并给我相同的基本查询计划:
StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --SET SHOWPLAN_TEXT ON SELECT TOP 5 Sum(LogCount) AS Views , Inv_ID FROM LogInvSearches_Daily D (NOLOCK) WHERE LogDay > DateAdd(d, -30, getdate()) AND EXISTS( SELECT NULL FROM propertyControlCenter.dbo.Inventory (NOLOCK) WHERE Acct_ID = 18731 AND Inv_ID = D.Inv_ID ) GROUP BY Inv_ID ORDER BY Views, Inv_ID (1 row(s) affected) StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |--Sort(TOP 5, ORDER BY:([Expr1006] ASC, [D].[Inv_ID] ASC)) |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1006]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount]))) |--Sort(ORDER BY:([D].[Inv_ID] ASC)) |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID])) |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1010], [Expr1011], [Expr1009])) | |--Compute Scalar(DEFINE:(([Expr1010],[Expr1011],[Expr1009])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6)))) | | |--Constant Scan | |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1010] AND [D].[LogDay] < [Expr1011]) ORDERED FORWARD) |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID]), SEEK:([propertyControlCenter].[dbo].[Inventory].[Acct_ID]=(18731) AND [propertyControlCenter].[dbo].[Inventory].[Inv_ID]=[LOALogs].[dbo].[LogInvS (9 row(s) affected)
谢谢,
丹
答案 0 :(得分:1)
我还没有读完你的整个问题(我很快就会谈到)但回答一个早期评论:你可以在SQL中使用分区视图 Server 2008标准版。它的分区表(无可置疑地更灵活)仅限于企业版。
分区观看信息:http://msdn.microsoft.com/en-us/library/ms190019.aspx
在更广泛的问题上,我想知道你是否真的需要DENSE_RANK。我想知道你是否在DENSE_RANK中的ORDER BY和查询本身的ORDER BY之间感到困惑。目前,您的TOP 5将返回5 undefined 记录,因为除非指定了ORDER BY子句(您尚未完成),否则SQL Server不保证记录上的任何顺序。如果您将ORDER BY从DENSE_RANK向下移动到整个查询ORDER BY,如下所示,记录将按照我的意愿出现,它将消除对昂贵的DENSE_RANK聚合函数的需求。
SELECT TOP 5
SUM([LogCount]) AS [Views],
[Inv_ID]
FROM [LogInvSearches_Daily] D (NOLOCK)
WHERE
[LogDay] > DateAdd(d, -30, getdate())
AND EXISTS(
SELECT *
FROM Inventory (NOLOCK)
WHERE Acct_ID = 18731
AND Inv_ID = D.Inv_ID
)
GROUP BY
Inv_ID
ORDER BY
[Views] DESC,
[Inv_ID]
<强>更新强>
时间可能在这里用完了:
|--Sort(ORDER BY:([D].[Inv_ID] ASC))
您可以尝试创建这样的覆盖索引:
CREATE NONCLUSTERED INDEX [IX_LogInvSearches_Daily_Perf] ON [dbo].[LogInvSearches_Daily]
(
[Inv_ID] ASC,
[LogDay] ASC
)
INCLUDE
(
[LogCount]
)
请注意,我还略微更改了ORDER BY(Inv_ID现在已经排序为ASC而不是DESC)。我怀疑这种改变不会以有问题的方式影响结果,但可能有助于提高性能,因为它将以与它们分组相同的顺序返回行(尽管这可能是不相关的!)。
答案 1 :(得分:1)
除了分区,
根据我们使用比您更大的表的经验,我们将数据提取到临时表(而不是表变量)并在其上进行聚合。不是所有查询,而是更复杂的查询。
除此之外,我同意Daniel Renshaw关于DENSE_RANK的观察
我还考虑将[Inv_ID],[LogCount]移动到索引中(不包括,可能使用DESC排序)
答案 2 :(得分:0)
Acct_ID在Inventory表上,似乎有自己的索引(IX_Inventory_Acct_ID)。也许如果Inventory上有一个索引(Acct_Id,Inv_Id)并且LogInvSearches_Daily被聚集(或者至少被索引)(Inv_Id,LogDay),那么你会有更多的运气。
顺便说一句,我不知道LogInvSearches_Daily.ID上当前的聚类索引应该是什么给你买的。为什么导入要在磁盘上关闭具有关闭ID的记录?