我有这个SQL工作正常。
希望我的过滤器返回具有最高UserSessionSequenceID的LATEST唯一SessionGuids。
问题是性能很糟糕 - 即使我有很好的索引。 我怎样才能重写这个 - 省略ROW_NUMBER行?
SELECT TOP(@resultCount) * FROM
(
SELECT
[UserSessionSequenceID]
,[SessionGuid]
,[IP]
,[Url]
,[UrlTitle]
,[SiteID]
,[BrowserWidth]
,[BrowserHeight]
,[Browser]
,[BrowserVersion]
,[Referer]
,[Timestamp]
,ROW_NUMBER() over (PARTITION BY [SessionGuid]
ORDER BY UserSessionSequenceID DESC) AS sort
FROM [tblSequence]
) AS t
WHERE ([Timestamp] > DATEADD(mi, -@minutes, GETDATE()))
AND (SiteID = @siteID)
AND sort = 1
ORDER BY [UserSessionSequenceID] DESC
非常感谢: - )
答案 0 :(得分:9)
即使我有好的指数
没有冒犯,但让我们做出判断。在询问SQL Server性能问题时,始终发布表的完全架构,包括所有索引和基数。
例如,让我们考虑下面的表结构:
create table tblSequence (
[UserSessionSequenceID] int not null
,[SessionGuid] uniqueidentifier not null
,[SiteID] int not null
,[Timestamp] datetime not null
, filler varchar(512));
go
create clustered index cdxSequence on tblSequence (SiteID, [Timestamp]);
go
与您的相同,但所有与性能问题无关的字段都会聚合到通用填充程序中。让我们看看,对于大约50,000个会话,1M行的性能有多糟糕?让我们用随机数据填充表格,但我们将模拟“用户活动”的内容:
set nocount on;
declare @i int = 0, @sc int = 1;
declare @SessionGuid uniqueidentifier = newid()
, @siteID int = 1
, @Timestamp datetime = dateadd(day, rand()*1000, '20070101')
, @UserSessionSequenceID int = 0;
begin tran;
while @i<1000000
begin
insert into tblSequence (
[UserSessionSequenceID]
,[SessionGuid]
,[SiteID]
,[Timestamp]
, filler)
values (
@UserSessionSequenceID
, @SessionGuid
, @siteID
, @timestamp
, replicate('X', rand()*512));
if rand()*100 < 5
begin
set @SessionGuid = newid();
set @siteID = rand() * 10;
set @Timestamp = dateadd(day, rand()*1000, '20070101');
set @UserSessionSequenceID = 0;
set @sc += 1;
end
else
begin
set @timestamp = dateadd(second, rand()*300, @timestamp);
set @UserSessionSequenceID += 1;
end
set @i += 1;
if (@i % 1000) = 0
begin
raiserror(N'Inserted %i rows, %i sessions', 0, 1, @i, @sc);
commit;
begin tran;
end
end
commit;
这需要大约1分钟才能填满。现在让我们查询你问的同一个查询:在过去的Y分钟内,网站X上任何用户会话的最后一个动作是什么?我将不得不使用@now的特定日期而不是GETDATE()因为emy dtaa是模拟的,而不是真实的,所以我使用随机填写的最大时间戳为SiteId 1:
set statistics time on;
set statistics io on;
declare @resultCount int = 30;
declare @minutes int = 60*24;
declare @siteID int = 1;
declare @now datetime = '2009-09-26 02:08:27.000';
SELECT TOP(@resultCount) * FROM
(
SELECT
[UserSessionSequenceID]
,[SessionGuid]
, SiteID
, Filler
,[Timestamp]
,ROW_NUMBER() over (PARTITION BY [SessionGuid]
ORDER BY UserSessionSequenceID DESC) AS sort
FROM [tblSequence]
where SiteID = @siteID
and [Timestamp] > DATEADD(mi, -@minutes, @now)
) AS t
WHERE sort = 1
ORDER BY [UserSessionSequenceID] DESC ;
这与您的查询相同,但限制性过滤器在里面移动 ROW_NUMBER()部分子查询。结果回来了:
Table 'tblSequence'. Scan count 1, logical reads 12, physical reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 31 ms.
热缓存上的响应时间为31毫秒,从表的近60k页读出12页。
<强>更新强>
再次阅读原始查询后,我意识到我的修改后的查询是不同的。您只需要新会话。我仍然相信SiteID和Timestmap的过滤是获得必要性能的唯一方法,因此解决方案是使用NOT EXISTS条件验证候选发现:
SELECT TOP(@resultCount) * FROM
(
SELECT
[UserSessionSequenceID]
,[SessionGuid]
, SiteID
, Filler
,[Timestamp]
,ROW_NUMBER() over (
PARTITION BY [SessionGuid]
ORDER BY UserSessionSequenceID DESC)
AS sort
FROM [tblSequence]
where SiteID = @siteID
and [Timestamp] > DATEADD(mi, -@minutes, @now)
) AS new
WHERE sort = 1
and not exists (
select SessionGuid
from tblSequence
where SiteID = @siteID
and SessionGuid = new.SessionGuid
and [TimeStamp] < DATEADD(mi, -@minutes, @now)
)
ORDER BY [UserSessionSequenceID] DESC
这将在我的笔记本电脑上返回,在400毫秒的会话中,在400毫秒的温暖缓存中返回1M行:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0
Table 'tblSequence'. Scan count 2, logical reads 709, physical reads 0
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 40 ms.
答案 1 :(得分:3)
尝试这些 - 应该是等效查询,但您必须比较查询计划:
SELECT DISTINCT TOP(@resultCount)
s.usersessionsequenceid,
s.sessionguid,
s.ip,
s.url,
s.urltitle,
s.siteid,
s.browserwidth,
s.browserheight,
s.browser,
s.browserversion,
s.referer,
s.timestamp
FROM tblsequence s
JOIN (SELECT t.sessionquid,
MAX(t.timestamp) AS max_ts
FROM tblsequence t
GROUP BY t.sessionguid) x ON x.sessionguid = s.sessionguid
AND x.max_ts = s.timestamp
WHERE s.siteid = @SiteID
AND s.timestamp > DATEADD(mi, -@minutes, GETDATE())
ORDER BY s.usersessionsequenceid DESC
SELECT TOP(@resultCount)
s.usersessionsequenceid,
s.sessionguid,
s.ip,
s.url,
s.urltitle,
s.siteid,
s.browserwidth,
s.browserheight,
s.browser,
s.browserversion,
s.referer,
s.timestamp
FROM tblsequence s
WHERE s.siteid = @SiteID
AND s.timestamp > DATEADD(mi, -@minutes, GETDATE())
AND EXISTS(SELECT NULL
FROM tblsequence t
WHERE t.sessionguid = s.sessionguid
GROUP BY t.sessionguid
HAVING MAX(t.timestamp) = s.timestamp
ORDER BY s.usersessionsequenceid DESC
但是如果你想获得值为2或更多的地方,你将不得不坚持使用你的ROW_NUMBER查询。