如何优化这个SQL

时间:2012-07-21 08:13:21

标签: sql tsql optimization

此查询大约需要01:30才能运行:

select DATEADD(dd, 0, DATEDIFF(dd, 0, t1.[OccurredOn]))
       , count(t2.UserId)
       , count(*) - count(t2.UserId)
from Events t1
left join (select c.UserId, min(c.OccurredOn) FirstOccurred
           from Events c
           where [OccurredOn] between @start and @end
           group by c.UserId) t2 on t1.OccurredOn = t2.FirstOccurred and t1.UserId = t2.UserId
where t1.EventType = @eventType
    and t1.[OccurredOn] between @start and @end
group by DATEADD(dd, 0, DATEDIFF(dd, 0, t1.[OccurredOn]))
order by DATEADD(dd, 0, DATEDIFF(dd, 0, t1.[OccurredOn]))

如果我从子查询中删除WHERE子句,它会立即运行。

使用WHERE自行运行子查询需要< 1秒

如果我SELECT将子查询首先放入表变量,并加入到该变量,则整个查询将在19s内运行。

Events表格如下:

[Events](
    [EventType] [uniqueidentifier] NOT NULL,
    [UserId] [uniqueidentifier] NOT NULL,
    [OccurredOn] [datetime] NOT NULL,
)

我有以下primary, nonclustered, nounique索引:

  • 的EventType
  • 用户ID
  • OccurredOn

继承执行计划

enter image description here

使用SQL Server 2008

两件事:

  1. 怎么回事?是什么让这个变得缓慢?
  2. 如何加快速度?
  3. 由于

2 个答案:

答案 0 :(得分:1)

您的查询速度很慢,因为您的排序取决于动态计算(DATEADD(dd, 0, DATEDIFF(dd, 0, t1.[OccurredOn]))),Sql Server无法在即时计算中使用索引。

Postgresql有index on expression,使用Postgresql,你基本上可以将表达式的结果保存到实际的列(幕后列),所以当时机到来时你需要对该表达式进行排序,Postgresql可以在该表达式上使用索引。

Sql Server中最接近的类似功能是持久化公式。

您可以通过此示例查询轻松验证该功能:

create table PersonX
(
Lastname varchar(50) not null,
Firstname varchar(50) not null
);

create table PersonY
(
Lastname varchar(50) not null,
Firstname varchar(50) not null
);


alter table PersonX add Fullname as Lastname + ', ' + Firstname PERSISTED;    
create index ix_PersonX on PersonX(Fullname);

declare @i int = 0;

while @i < 10000 begin
    insert into PersonX(Lastname,Firstname) values('Lennon','John');
    insert into PersonY(Lastname,Firstname) values('Lennon','John');
    set @i = @i + 1;
end;


select top 1000 Lastname, Firstname
from PersonX
order by Fullname;


select top 1000 Lastname, Firstname
from PersonY
order by Lastname + ', ' + Firstname;

在PersonX上对fullname执行订单比PersonY快。 PersonX的查询成本仅为32%,而PersonY为68%

要解决查询的性能,请执行以下操作:

alter table Events 
    add OccurenceGroup as 
        DATEADD(dd, 0, DATEDIFF(dd, 0, [OccurredOn])) PERSISTED

create index ix_Events on Events(OccurenceGroup);

然后在OccurenceGroup上进行分组和排序。


顺便说一句,您是否在OccuredOn上添加了索引,还在EventType上添加了索引?

答案 1 :(得分:1)

您可以尝试将LEFT JOIN替换为LEFT MERGE JOIN,这样派生的表t2只需计算一次,而不是每个用户可能多次重新计算MIN

你也可以使用排名函数重写这个,如下所示。它可能更便宜。您需要根据数据和索引测试这些想法。

;WITH T AS
(
SELECT *,
       RANK() OVER (PARTITION BY UserId ORDER BY OccurredOn) AS Rnk
FROM Events
WHERE [OccurredOn] BETWEEN @start AND @end
)
SELECT Dateadd(dd, 0, Datediff(dd, 0, OccurredOn)),
       COUNT(CASE WHEN Rnk =1 THEN 1 END),
       COUNT(CASE WHEN Rnk >1 THEN 1 END)
FROM T
WHERE EventType = @eventType      
GROUP BY Dateadd(dd, 0, Datediff(dd, 0, OccurredOn)) 
ORDER BY Dateadd(dd, 0, Datediff(dd, 0, OccurredOn))