如何更快地进行分组查询?

时间:2015-12-03 10:58:29

标签: sql-server group-by

我有下表有近2百万条记录,当然每天都在增加。一些表记录(在columnId下面是父表的外键,例如DirectionId - > Direction table,...):

Id         TypeId   DirectionId UserId  IndicatorId Date                                     Size      ExternalId
2003    100        1              1          1              2015-06-01 00:02:23.0000000 11931   28657340
2004    2           1               2          1             2015-06-01 00:03:21.0000000 10358   28657341
2005    2           2               2          1             2015-06-01 00:03:31.0000000 10848   28657342
2006    100        1              2          1             2015-06-01 00:03:52.0000000  7860    28657343
2007    100        1              3          1             2015-06-01 00:03:59.0000000  13353   28657344

我需要获取最后一条消息TypeId和DirectionId的日期时间。下面的查询返回我需要的内容

select TypeId, DirectionID, max(date) as Date
from message
group by TypeId, DirectionID;

DirectionId TypeId  Date
2               1         2015-06-05 15:12:37.0000000
1               1         2015-06-05 15:12:39.0000000

问题是该查询需要2500毫秒到3000毫秒才能执行。我添加了索引:

CREATE NONCLUSTERED INDEX [date_index] ON [mqview].[Message] ([Date] ASC)
INCLUDE ([Id],  [TypeId],[DirectionId], [UserId], [Size], [ExternalId]) WITH (PAD_INDEX = OFF, 
STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, 
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

如何更快地获得结果?

更新

通过建议的添加索引,我可以更快地获得结果,但现在我希望通过两个内部联接获得更快的结果,如上所述。或者最终我可以从表MessageDirection和MessageType执行额外的2个查询,如果不能做任何事情来提高以下查询的性能。

SET STATISTICS TIME ON
select  mt.Code, md.Code, max(m.date) as Date
from
  mqview.Message m
  inner join mqview.MessageDirection md on (md.Id = m.DirectionId)
  inner join mqview.MessageType mt on (mt.Id = m.TypeId)
group by mt.Code, md.Code
SET STATISTICS TIME OFF

消息:

 SQL Server Execution Times:
 CPU time = 3343 ms,  elapsed time = 2817 ms.

执行计划:enter image description here

2 个答案:

答案 0 :(得分:3)

您的索引对查询没有帮助。您首先按TypeId进行分组,而您的索引首先按Id排序。因此,要按TypeId然后DirectionId进行分组,查询仍然必须扫描表中的每一行。然后,一旦按这些值分组,就必须查看每个组中的每一行以找到最大日期。

如果您的行由TypeId索引,然后由DirectionId索引,则分组会更快,因为行自然会按照索引中的分组顺序排列。如果您然后将Date添加到索引,那么查询将知道每个组中的最后一行将是最高日期,这将加快一点,但如果您进行Date排序在索引降序中,每个组中的 first 行将具有最高日期。这意味着只需要查看每个组中的第一行。这将提供极大的速度提升 - 您可能会发现,使用此索引,您的查询将近乎即时。

因为索引现在包含查询中的所有值,所以甚至不需要访问表的实际行。数据库引擎可以直接从索引返回值。这将从查询处理中删除另一个步骤,并使其再次更快。

您的CREATE INDEX语句如下所示:

CREATE INDEX ix_myNewIndex ON [mqview].[Message] (TypeId, DirectionId, [Date] DESC) 

答案 1 :(得分:2)

IF OBJECT_ID('tempdb.dbo.#temp') IS NOT NULL
    DROP TABLE #temp
GO

CREATE TABLE #temp
(
    Id INT PRIMARY KEY,
    TypeId TINYINT,
    DirectionId TINYINT,
    UserId TINYINT,
    IndicatorId TINYINT,
    [Date] DATETIME2
)

CREATE /*UNIQUE*/ NONCLUSTERED INDEX ix ON #temp (TypeId, DirectionId, [Date] DESC) -- DESC
GO

INSERT INTO #temp (Id, TypeId, DirectionId, UserId, IndicatorId, [Date])
VALUES
    (2003, 100, 1, 1, 1, '20150601 00:02:23.0000000'),
    (2004, 2  , 1, 2, 1, '20150601 00:03:21.0000000'),
    (2005, 2  , 2, 2, 1, '20150601 00:03:31.0000000'),
    (2006, 100, 1, 2, 1, '20150601 00:03:52.0000000'),
    (2007, 100, 1, 3, 1, '20150601 00:03:59.0000000')


SELECT TypeId, DirectionID, MAX([Date])
FROM #temp
GROUP BY TypeId, DirectionId

更新

SELECT mt.Code, md.Code, t.[Date]
FROM (
    SELECT TypeId, DirectionID, [Date] = MAX([Date])
    FROM mqview.[Message]
    GROUP BY TypeId, DirectionId
) t
JOIN mqview.MessageDirection md on md.Id = t.DirectionId
JOIN mqview.MessageType mt on mt.Id = t.TypeId