帮助我优化此SQL Server 2005查询

时间:2011-03-17 16:54:40

标签: sql-server-2005 query-optimization

我的查询运行速度太慢。我不确定我应该提供的所有信息是为了让您轻松帮助我,但我会抓住它然后添加更多,当你的大脑不可避免地要求的东西,我要么没有'我认为包括或不知道是什么。

我想识别2006年首次购买的客户(但仅使用他们的部分地址 - 容纳家庭和企业)。

我的第一次尝试是:

select
    distinct a.line1 + '|' + substring(a.zip,1,5)
from 
    registrations r
    join customers c on r.custID = c.id
    join addresses a on c.addressID = a.id
where year(r.purchaseDate) = 2006
    and a.line1 + '|' + substring(a.zip,1,5) not in (
        select
            distinct a.line1 + '|' + substring(a.zip,1,5)
        from
            registrations r
            join customers c on r.custID = c.id
            join addresses a on c.addressID = a.id
        where
            year(r.purchaseDate) < 2006
    )

当它运行时间过长时,我切换了一个NOT EXISTS(我不太舒服,但愿意尝试),如

select
    distinct a.line1 + '|' + substring(a.zip,1,5)
from
    registrations r
    join customers c on r.custID = c.id
    join addresses a on c.addressID = a.id
where
    year(r.purchaseDate) = 2006
    and not exists (
        select
            1
        from
            registrations r
            join customers c on r.custID = c.id
            join addresses ia on c.addressID = ia.id
        where
            ia.line1 + '|' + substring(ia.zip,1,5) = a.line1 + '|' + substring(a.zip,1,5) and
            year(r.purchaseDate) < 2006
        )
group by
    a.line1 + '|' + substring(a.zip,1,5)

但它也运行得太久了。像17小时没有结果太长了。我认为首先要考虑的是我的SQL可能是错误的或次优的,但如果不是这样,我还想给你足够的信息来考虑环境。

所以,诊断信息。你可能不在乎,但以防万一:它运行在带有四个四核和20 GB RAM的G6服务器上;每个查询仅限于占用四个处理器,以保持Web服务器请求的性能;当我运行此查询时,由于死锁,我们正在清除其他大型导入和报告,但Web服务器面向客户并且无法停止。)大致有:1500万注册,1100万客户和8.6百万个地址。我重建了所有索引只是为了确保碎片不是问题。但是,我不太确定如何正确索引,所以我完全接受这是一个问题 - 这些索引中的一些是由于我的未来而且有些是MS分析工具之一给我的脚本提高性能。我也不确定如何向你传达索引信息,所以我只给出创建脚本:

ALTER TABLE [dbo].[registrations] ADD  CONSTRAINT [PK_flatRegistrations_1] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

ALTER TABLE [dbo].[customers] ADD  CONSTRAINT [PK_flatCustomers_1] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

ALTER TABLE [dbo].[addresses] ADD  CONSTRAINT [PK_addresses] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]


CREATE NONCLUSTERED INDEX [addresses] ON [dbo].[addresses] 
(
    [line1] ASC,
    [line2] ASC,
    [city] ASC,
    [state] ASC,
    [zip] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]


CREATE NONCLUSTERED INDEX [deliverable] ON [dbo].[addresses] 
(
    [addressDeliverable] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]


CREATE NONCLUSTERED INDEX [_dta_index_addresses_5_1543676547__K9_K1_6] ON [dbo].[addresses] 
(
    [addressDeliverable] ASC,
    [ID] ASC
)
INCLUDE ( [zip]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]


CREATE NONCLUSTERED INDEX [_dta_index_addresses_5_1543676547__K1_K9_6] ON [dbo].[addresses] 
(
    [ID] ASC,
    [addressDeliverable] ASC
)
INCLUDE ( [zip]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]


CREATE NONCLUSTERED INDEX [_dta_index_addresses_5_1543676547__K1_6] ON [dbo].[addresses] 
(
    [ID] ASC
)
INCLUDE ( [zip]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

非常感谢您的光临!

4 个答案:

答案 0 :(得分:1)

我的第一次尝试将是替换:

year(r.purchaseDate) = 2006

使用:

r.purchaseDate BETWEEN '2006-01-01' and '2006-12-31 23:59:59' 

以及year(r.purchaseDate) < 2006r.purchaseDate < '2006-01-01'

并确保purchaseDate上有索引。

接下来(如果你有足够的资源来运行它):

 -- create temporary table to prepare data
CREATE TABLE #addrs (yearr int, pattern varchar(100)) -- depends on a.line1 length

-- calculate all patterns for purchase before 1st Jan 2007
INSERT INTO 
  #addrs (yearr, pattern)
SELECT
  YEAR(r.purchaseDate),
  a.line1 + '|' + substring(a.zip,1,5)
from
    registrations r
    join customers c on r.custID = c.id
    join addresses a on c.addressID = a.id
where
    r.purchaseDate < `2007-01-01`

-- optionally, but could be useful in query below
CREATE INDEX idx_temp ON #addrs (pattern, yearr)  

-- original query rewritten
SELECT
  DISTINCT pattern
FROM
  #addrs a
WHERE
  a.yearr = 2006
  and not exists (
    select top 1 1 
    from 
       #addrs aa 
    where 
       aa.pattern = a.pattern
       and aa.yearr < 2006
  )

第二个解决方案可能有一些拼写错误,无法从第一次尝试编译。 这只是一个想法。

答案 1 :(得分:1)

我认为你的Not Exists子查询的表别名是错误的。试试这个:

select  r.custID,
        a.line1 + '|' + substring(a.zip,1,5) 
from    registrations r     
join    customers c on r.custID = c.id     
join    addresses a on c.addressID = a.id 
where   r.purchaseDate between '2006-01-01' and '2006-12-31'      
and not exists (         
        select  1         
        from   registrations ir             
        join customers ic on ir.custID = ic.id           
        join addresses ia on ic.addressID = ia.id         
        where   ia.line1 = a.line1
        and     substring(ia.zip,1,5) = substring(a.zip,1,5) 
        and     ir.purchaseDate < '2006-12-31'        
        )

答案 2 :(得分:0)

SubString(A.zip,1,5)必须导致表扫描。这是一次性查询吗?如果是这样,请获取以下查询的结果并将其存储在新表中。在AddressToCompare和PurchaseDate上创建索引,并针对新表运行后续查询。

Select
      R.ID
    , R.CustID
    , C.AddressID
    , A.line1 + '|' + SubString(A.zip, 1, 5) As AddressToCompare
    , R.PurchaseDate
From
    Registrations R
    Inner Join Customers C On R.CustID = C.ID
    Inner Join addresses A On C.AddressID = A.ID
Where
    R.PurchaseDate <= '2006-12-31'

答案 3 :(得分:0)

首先,您的逻辑是差的,业务和客户移入和移出地址,因此比较地址而不是客户是错误结果的保证。仅仅因为ABC公司在2002年订购的东西并不意味着DEF公司在2006年没有成为第一家公司,因为ABC comapny和DEF公司没有任何关系。如果你需要与同一家公司或家庭有关系的人,然后有一张桌子来正确存放它们,不要依赖不正确的黑客攻击。

假设您无法执行此操作并且这是一个将运行多次的进程,那么您需要在地址表中使用

来保留列
line1 + '|' + substring(zip,1,5)

这可以防止你不得不动态计算它。