MAX,GROUP BY查询需要很长时间跨越大表

时间:2018-01-28 01:07:29

标签: sql sql-server tsql azure-sql-database

我有以下非常基本的查询,但运行需要27秒。

以下是执行计划 - https://www.brentozar.com/pastetheplan/?id=rJdzqscBf

有人能看到改善它的方法吗?如果某些样本数据/表格结构有用,请告诉我。

Visit表有1,347,957行,VisitMovement有5,294,399行。

DECLARE @RecentlyLeftDate datetimeoffset(7)
SELECT @RecentlyLeftDate = dateadd(hh,-4,sysdatetimeoffset())

SELECT 
    MAX(VM.VisitMovementID) as VisitMovementID
FROM
    Visit V
INNER JOIN VisitMovement VM ON 
    V.VisitID = VM.VisitID
WHERE
    V.EndDate > @RecentlyLeftDate
GROUP BY
    V.VisitID

表格上有索引:

CREATE NONCLUSTERED INDEX [IDX_VisitMovement_VisitID] ON [dbo].[VisitMovement]
(
    [VisitID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
ALTER TABLE [dbo].[Visit] ADD  CONSTRAINT [PK_Visit] PRIMARY KEY CLUSTERED 
(
    [VisitID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
CREATE NONCLUSTERED INDEX [IDX_Visit_EndDate] ON [dbo].[Visit]
(
    [EndDate] ASC
)
INCLUDE (   [ClientID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

3 个答案:

答案 0 :(得分:2)

从您的查询中,我猜测VisitMovement没有EndDate,因此加入只是使用来自Visit表的EndDate。如果是这样,为什么你不只是从Visit表加入ID和EndDate,而不是加入整个表?

所以,你可以这样做:

SELECT 
    MAX(VM.VisitMovementID) as VisitMovementID
FROM 
    VisitMovement VM
INNER JOIN 
    (SELECT VisitID, EndDate FROM Visit WHERE EndDate > @RecentlyLeftDate) V ON V.VisitID = VM.VisitID
WHERE
    V.EndDate > @RecentlyLeftDate
GROUP BY
    V.VisitID

在INNER JOIN中添加WHERE EndDate > @RecentlyLeftDate会减少从Visit表中检索到的记录,因此它只会检索符合该时间轴的记录,而不会检索1,347,957条记录!

您也可以调整索引并确保在索引键列下添加标识列(确保为每列添加正确的排序顺序),并在“包含”列中添加经常使用的列。

替代方法: 这是另一种想到的方法,你需要检查并尝试一下

SELECT 
    MAX(VM.VisitMovementID) as VisitMovementID
FROM 
    VisitMovement VM
WHERE 
    VisitID IN (SELECT VisitID FROM Visit WHERE EndDate > @RecentlyLeftDate)
GROUP BY 
    V.VisitID

答案 1 :(得分:1)

这是您的查询:

SELECT MAX(VM.VisitMovementID) as VisitMovementID
FROM Visit V INNER JOIN
     VisitMovement VM 
     ON V.VisitID = VM.VisitID
WHERE V.EndDate > @RecentlyLeftDate
GROUP BY V.VisitID;

我发现这是一个奇怪的结构,因为GROUP BY键不属于SELECT

尽管如此,最好的索引是Visit(EndDate, VisitId)VisitMovement(VisitId, VisitMovementID)

答案 2 :(得分:1)

您的查询计划很好,我会通过在插入

后添加索引来避免排序
INSERT INTO #ResultsVisitMovement 
( VisitMovementID )
select max(movementid)
rest of query

此外,当我检查查询计划时,我可以看到很多等待统计数据

  <Wait WaitType="RESERVED_MEMORY_ALLOCATION_EXT" WaitTimeMs="5" WaitCount="7527"/>
              <Wait WaitType="IO_QUEUE_LIMIT" WaitTimeMs="1250" WaitCount="76"/>
              <Wait WaitType="RESOURCE_GOVERNOR_IDLE" WaitTimeMs="18669" WaitCount="1428"/>
              <Wait WaitType="SOS_SCHEDULER_YIELD" WaitTimeMs="21248" WaitCount="945"/>

内存分配等待类型非常低,所以我会忽略它。你已经达到了IO队列限制1.2秒。

RESOURCE_GOVERNOR_IDLE很有意思,我怀疑你已达到某个上限,你的查询被迫等待某事(IO,CPU,内存......)

最后SOS yeild说总累积等待时间是21秒..

在这种情况下,我会在下面查看

运行以下查询以查看我的天蓝色限制(CPI,IO,RAM,日志)是否在一段时间内始终超过90。

SELECT * FROM sys.dm_db_resource_stats 
ORDER BY end_time DESC; 

我会尝试调整高资源,例如,如果CPU限制在一段时间内一直平均超过90,那么会收集导致CPU的顶级查询,我将尝试调整它们