如何改进此查询 - 在这里划分最佳选项?

时间:2013-06-16 18:54:17

标签: sql sql-server-2008 tsql

我有一个名为Transactions的表,目前包含600多万行(每月约600-7万) 它看起来像这样:

pk                                                           id          acct_id     id1         id2         id3         id4         created                 interface_id source_lvl1 source_lvl2 trans_type
------------------------------------------------------------ ----------- ----------- ----------- ----------- ----------- ----------- ----------------------- ------------ ----------- ----------- -----------
10000257.4297...400245990.3.1002                             10000257    4297        NULL        NULL        NULL        NULL        2012-09-06 11:26:30.000 1            32002       1002        3
10004819.1529.106.105442.400667675.6.1021                    10004819    1529        106         105442      62          NULL        2012-09-11 08:34:35.000 4            32002       1021        6
10004819.1529.18664647.62.400667675.3.1021                   10004819    1529        18664647    62          NULL        NULL        2012-09-11 08:34:35.000 4            32002       1021        3
10006460.1529.106.105442.400667675.6.1021                    10006460    1529        106         105442      62          NULL        2012-09-11 08:34:35.000 4            32002       1021        6
10006460.1529.18664647.62.400667675.3.1021                   10006460    1529        18664647    62          NULL        NULL        2012-09-11 08:34:35.000 4            32002       1021        3
10006648.3280...406204785.3.1002                             10006648    3280        NULL        NULL        NULL        NULL        2012-11-14 10:39:45.000 6            32002       1002        3
10006834.1529.106.105442.400667675.6.1021                    10006834    1529        106         105442      62          NULL        2012-09-11 08:34:35.000 4            32002       1021        6
10006834.1529.18664647.62.400667675.3.1021                   10006834    1529        18664647    62          NULL        NULL        2012-09-11 08:34:35.000 4            32002       1021        3
10006962.2428...415795811.3.1018                             10006962    2428        NULL        NULL        NULL        NULL        2013-03-05 10:50:11.000 1            32002       1018        3
10006962.2428.107972..415795811.4.1018                       10006962    2428        107972      NULL        NULL        NULL        2013-03-05 10:50:11.000 1            32002       1018        4

我已经定义了一个应该有助于计算特定事件的视图:

这里是sql定义:

CREATE VIEW [dbo].[Queue_base]

AS

select 
dateadd(minute , (DATEPART(minute,t.created)/30)*30 , DATEADD(hour,datediff(hour, 0, t.created), 0)) INTRVL_UTC,
dateadd(minute , (DATEPART(minute,t.created)/30)*30 + 30 , DATEADD(hour,datediff(hour, 0, t.created), 0)) INTRVL_END_UTC,
a.ID [Agent ID], a.Login, a.DisplayName, a.GroupName, q.QueueID, q.QueueName, 
    TODATETIMEOFFSET(t.created,0) created   
,i.ReferenceNumber, t.id inc_id
, case when (t.trans_type=17 and t.source_lvl2 not IN (1001, 2001)) or (t.trans_type=6 and t.id1=8) then t.id else null end [Workload]
, case when (t.trans_type=6 and t.id1=8 and t.source_lvl2 not IN (1001, 2001) or (t.trans_type=17 and not t.source_lvl2 IN (1001,2001)))then t.id else null end [Inbound Emails]
, case when t.trans_type=17 and t.id1=q.QueueID then t.id else null end [EnQueued]
, case when t.trans_type=17 and t.id2=q.QueueID then t.id else null end [DeQueued]
, case when t.trans_type=6 and t.id1 IN (2,106) then t.id else null end [Solved]
, case when t.trans_type=6 and t.id1 =8 then t.id else null end [Updated]
, case when x.StatusTypeID = 2 then t.id else null end [Reopened]
, case when t.trans_type=6 and t.id1=125 then t.id else null end [Spam]
, case when t.trans_type=8 and t.acct_id <> 1 then t.id else null end [Responded]
, case when i.cr_rec_element_1 is not null or i.de_reason1 is not null then t.id else null end [Complaint]
,t.trans_type, t.id1
,r.Brand, r.Region, r.[Call Center], r.LOB, r.[LOB Detail], r.Team, r.Subteam, r.Channel
,r.Interface, r.Product, r.[Product Detail], r.Unit
from Transactions t 
left join
(
select a.*, b.id1, st.StatusTypeID
from
(select  
t1.pk, t1.id, t1.created,   max(t2.created) maxdate
from Transactions t1 
    left join Transactions t2 
    on t1.id=t2.id and t2.created<t1.created and t2.trans_type=6
 left join Status st on t2.id1=st.StatusID
 where t1.trans_type=6 and t1.id1=8
group by t1.pk, t1.id, t1.created) a left join Transactions b on a.id=b.id and b.created=a.maxdate and b.trans_type=6
left join Status st on b.id1=st.statusid
)
x on t.pk=x.pk
left join Incident i on t.id=i.id
left join Account a on t.acct_id=a.ID
left join Queue q ON  (t.trans_type=17 and (t.id1=q.QueueID or t.id2=q.QueueID) or t.trans_type IN (6,8) and t.id3=q.QueueID) 
left join queuedim r ON (q.QueueName=r.QueueName or q.QueueName is null and r.QueueName is null) 
    and (q.QueueID=r.QueueID or q.QueueID is null and r.QueueID is null)
where t.trans_type=17 or t.trans_type IN (6,8)

这是观点的关键部分:

inc_id      Workload    Inbound Emails EnQueued    DeQueued    Solved      Updated     Reopened    Spam        Responded   Complaint
----------- ----------- -------------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
10209648    NULL        NULL           NULL        NULL        10209648    NULL        NULL        NULL        NULL        NULL
10209648    NULL        NULL           NULL        NULL        NULL        NULL        NULL        NULL        10209648    NULL
10209648    10209648    NULL           NULL        NULL        NULL        10209648    NULL        NULL        NULL        NULL
10227966    NULL        NULL           NULL        NULL        NULL        NULL        NULL        10227966    NULL        NULL
10288343    NULL        NULL           NULL        NULL        10288343    NULL        NULL        NULL        NULL        NULL
10303898    NULL        NULL           NULL        NULL        10303898    NULL        NULL        NULL        NULL        NULL
10394204    NULL        NULL           NULL        NULL        NULL        NULL        NULL        10394204    NULL        NULL
10409624    NULL        NULL           NULL        NULL        10409624    NULL        NULL        NULL        NULL        NULL
10482071    NULL        NULL           NULL        NULL        NULL        NULL        NULL        10482071    NULL        NULL
10485993    NULL        NULL           NULL        NULL        NULL        NULL        NULL        10485993    NULL        NULL

我的计划是创建另一个表,然后使用我感兴趣的汇总结果连续更新它,按日期和其他维度的组合进行分组。问题是我需要对上面描述的事件进行明确和简单的计数,但是,虽然后一个视图非常快速地产生其原始结果,但是另一个带有计数的查询需要很长时间:

    --  month   account
declare @d1 date
declare @d2 date

set @d1 = '2013-05-01'
set @d2 = '2013-06-01'
--insert into IncPerfQueue
select x.Brand, x.Region, x.[Call Center], x.LOB, x.[LOB Detail], x.Team, x.Subteam,
x.QueueName, case when x.[Agent ID] is null then 0 else [Agent ID] end,  c.[month], NULL weekstart, NULL [date]

, count(distinct EnQueued) [Distinct Incidents EnQueued]
, count(distinct DeQueued) [Distinct Incidents DeQueued]
, count(distinct Solved) [Distinct Incidents Solved in the queue]
, COUNT(distinct Responded) [Distinct Incidents Responded in the queue]
, COUNT(distinct Updated)   [Distinct Incidents Updated in the queue]
, count(distinct Reopened) [Distinct Incidents ReOpened in the queue]
, count(distinct Spam) [Distinct Spam closed in the queue]
, COUNT([Inbound Emails]) [Inbound Emails]
, COUNT(Workload) [Workload]
, count(EnQueued) [# EnQueued]
, count(DeQueued) [# DeQueued]
, count(Solved) [# Solved in the queue]
, COUNT(Responded) [# Responded in the queue]
, COUNT(Updated) [#Updated in the queue]
, count(Reopened) [# ReOpened in the queue]
, count(Spam) [# Spam closed in the queue]

from Queue_base x
join [calendar] c ON convert(date,x.created)=c.date
where x.created >= @d1 and x.created < @d2
and Brand is not null
group by x.Brand, x.Region, x.[Call Center], x.LOB, x.[LOB Detail], x.Team, x.Subteam,
x.QueueName, [Agent ID], c.month

这只是必需的查询之一,因为需要不同维度的单独聚合(每个分组的不同计数),并且花费了超过1小时! http://i.stack.imgur.com/oWibJ.png

我将非常感谢您就此类查询中最佳方法提出的建议。基表肯定会很快变大......我应该分区吗?我还应该注意,这里引用的所有表都是索引的,我正在使用:Microsoft SQL Server 2008 R2(SP2)(X64) 安装在配备2x X5550处理器和48GB RAM的盒子上 操作系统是Windows Server 2008 R2 Enterprise。

谢谢, 马驹

1 个答案:

答案 0 :(得分:0)

查看您的查询,我猜您的大部分费用都在左联接和分组中。你可能无法在不牺牲正确性的情况下对左连接做很多事情,但我会查看你的查询计划,看看这些组的成本是多少。

由于您有数百万行,我猜您的查询计划中的排序占用了90%以上的时间。按列在这些组上添加一些索引确实有助于将这些排序转换为索引扫描。扫描该索引肯定比排序更快,每次都有效地重建这些索引。如果您可以发布您的查询计划,那将非常有帮助。您可以使用较小版本的数据(可能是每个表中的几千行),这样您就可以使用索引编制而无需等待很长时间来创建它们。

我认为不需要分区来缩短查询时间。分区仅对IO级有帮助。由于你说的是几百万行,我猜这一切,所有相关表的所有相关列只有几千兆字节。从磁盘加载它需要几分钟,但不是一个小时。当然,SQL Server会将大部分内容保留在内存中,同时为一小时的查询提供服务。因此,分区在这里没有多大帮助。一旦您的查询计划没有任何热点,并且您发现io是查询中的瓶颈,那么我会考虑进行分区。

您可以使用set statistics io(http://msdn.microsoft.com/en-us/library/ms184361.aspx)检查IO。