左连接CTE性能降低

时间:2019-07-18 17:41:18

标签: sql-server tsql

我需要提供一个报告,该报告显示表上的所有用户及其得分。并非所有在该表上的用户都会得到一个分数,因此在我的解决方案中,我首先使用几个CTE来计算分数,然后在最终CTE中我拉完整的名册并为没有实际分数的用户分配默认分数。

尽管CTE不太复杂,但也不简单。另外,当我为具有实际分数的用户运行CTE的计算部分时,它的运行时间不到一秒钟。当我加入一个最终的CTE时,该CTE会获取完整的花名册并分配默认分数,在零分数出现的地方(没有实际分数),轮子会完全掉下来,并且永远不会完成。

我已经尝试过打开索引并刷新它们,但无济于事。我注意到当切换到INNER时,位于agent_efficiency的联接在一秒钟内就会运行,但是我需要将其作为LEFT联接,这样即使没有分数,它也可以加入整个花名册。

编辑*

Execution Plan Inner Join

Execution Plan Left Join

WITH agent_split_stats AS ( 
Select
    racf,
    agent_stats.SkillGroupSkillTargetID,
    aht_target.EnterpriseName,
    aht_target.target,
    Sum(agent_stats.CallsHandled) as n_calls_handled,
    CASE WHEN (Sum(agent_stats.TalkInTime) + Sum(agent_stats.IncomingCallsOnHoldTime) + Sum(agent_stats.WorkReadyTime)) = 0 THEN 1 ELSE
        (Sum(agent_stats.TalkInTime) + Sum(agent_stats.IncomingCallsOnHoldTime) + Sum(agent_stats.WorkReadyTime)) END
    AS total_handle_time
from tblAceyusAgntSklGrp as agent_stats
-- GET TARGETS
INNER JOIN tblCrosswalkWghtPhnEffTarget as aht_target
  ON aht_target.SgId = agent_stats.SkillGroupSkillTargetID
  AND agent_stats.DateTime BETWEEN aht_target.StartDt and aht_target.EndDt
-- GET RACF 
INNER JOIN tblAgentMetricCrosswalk as xwalk
  ON xwalk.SkillTargetID = agent_stats.SkillTargetID
--GET TAU DATA LIKE START DATE AND GRADUATED FLAG
INNER JOIN tblTauClassList AS T
  ON T.SaRacf = racf
WHERE
--FILTERS BY A ROLLING 15 BUSINESS DAYS UNLESS THE DAYS BETWEEN CURRENT DATE AND TAU START DATE ARE <15
agent_stats.DateTime >=
    CASE WHEN dbo.fn_WorkDaysAge(TauStart, GETDATE()) <15 THEN TauStart ELSE
        dbo.fn_WorkDate15(TauStart) 
    END
And Graduated = 'No'
--WPE FILTERS TO ENSURE ACCURATE DATA
AND CallsHandled <> 0
AND Target is not null
Group By
racf, agent_stats.SkillGroupSkillTargetID, aht_target.EnterpriseName, aht_target.target
),
agent_split_stats_with_weight AS (
-- calculate weights
-- one row = one advocate + split
SELECT 
    agent_split_stats.*,
    agent_split_stats.n_calls_handled/SUM(agent_split_stats.n_calls_handled) OVER(PARTITION BY agent_split_stats.racf) AS [weight]
FROM agent_split_stats
),
agent_split_effectiveness AS (
-- calculate the raw Effectiveness score for each eligible advocate/split
-- one row = one agent + split, with their raw Effectiveness score and the components of that
SELECT 
    agent_split_stats_with_weight.*,
    -- these are the components of the Effectiveness score
    (((agent_split_stats_with_weight.target * agent_split_stats_with_weight.n_calls_handled) / agent_split_stats_with_weight.total_handle_time)*100)*agent_split_stats_with_weight.weight AS effectiveness_sum
FROM agent_split_stats_with_weight
), -- this is where we show effectiveness per split  select * from agent_split_effectiveness
agent_effectiveness AS (
-- sum all of the individual effectiveness raw scores for each agent to get each agent's raw score
SELECT 
    racf AS SaRacf,
    ROUND(SUM(effectiveness_sum),2) AS WpeScore
FROM agent_split_effectiveness
GROUP BY racf
),
--GET FULL CLASS LIST, TAU DATES, GOALS FOR WHOLE CLASS
tau AS (
Select L.SaRacf, TauStart, Goal as WpeGoal 
,CASE WHEN agent_effectiveness.WpeScore IS NULL THEN 1 ELSE WpeScore END as WpeScore
FROM tblTauClassList AS L
LEFT JOIN agent_effectiveness
  ON agent_effectiveness.SaRacf = L.SaRacf
LEFT JOIN tblCrosswalkTauGoal AS G
  ON G.Year = TauYear
  AND G.Bucket = 'Wpe'
WHERE TermDate IS NULL
AND Graduated = 'No'
)
SELECT tau.*,
CASE WHEN dbo.fn_WorkDaysAge(TauStart, GETDATE()) > 14 --MUST BE AT LEAST 15 DAYS TO PASS
        AND WpeScore >= WpeGoal THEN 'Pass'
    ELSE 'Fail' END 
from tau

这种查询风格可以在其他3种不同的计算类型(不同的分数类型)中很好地运行。所以我不确定为什么它在这里如此严重地失败。实际结果应该是个人列表,日期,分数,目标和分数。如果没有分数,则提供默认分数。此外,还有使用得分/目标的通过/失败指标。

2 个答案:

答案 0 :(得分:2)

如@Habo所述,我们需要实际的执行计划(例如,在打开“包括实际执行计划”的情况下运行查询。)我查看了您发布的内容,没有任何内容可以解释该问题。实际计划与估计计划的区别在于,记录了实际检索到的行数;这对于解决性能不佳的查询至关重要。

也就是说,我确实看到两个查询都有很大的问题。问题一旦解决,就会将两个查询的时间缩短到不到一秒钟。您的查询利用了两个标量用户定义的函数(UDF):dbo.fn_WorkDaysAge和dbo.fn_WorkDate15。标量UDF废墟 一切。它们不仅速度慢,而且会强制执行串行执行计划,这会使使用它们的任何查询都慢得多。

我没有dbo.fn_WorkDaysAge或dbo.fn_WorkDate15的代码。我有自己的内联“ WorkDays”函数(下面的代码)。语法略有不同,但是值得付出性能上的努力。这是语法差异:

-- Scalar 
SELECT d.*, workDays = dbo.countWorkDays_scalar(d.StartDate,d.EndDate)
FROM   <sometable> AS d;

-- Inline version
SELECT d.*, f.workDays
FROM   <sometable> AS d
CROSS APPLY dbo.countWorkDays(d.StartDate,d.EndDate) AS f;

这是我进行的一项性能测试,旨在显示内联版本与标量版本之间的区别:

-- SAMPLE DATA
IF OBJECT_ID('tempdb..#dates') IS NOT NULL DROP TABLE #dates;

WITH E1(x)  AS (SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS x(x)),
     E3(x)  AS (SELECT 1 FROM E1 a, E1 b, E1 c),
     iTally AS (SELECT N=ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM E3 a, E3 b)
SELECT TOP (100000) 
  StartDate = CAST(DATEADD(DAY,-ABS(CHECKSUM(NEWID())%1000),GETDATE()) AS DATE),
  EndDate   = CAST(DATEADD(DAY,+ABS(CHECKSUM(NEWID())%1000),GETDATE()) AS DATE)
INTO #dates
FROM iTally;

-- PERFORMANCE TESTS
PRINT CHAR(10)+'Scalar Version (always serial):'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @workdays INT;
  SELECT @workdays = dbo.countWorkDays_scalar(d.StartDate,d.EndDate)
  FROM   #dates AS d;
PRINT DATEDIFF(MS,@st,GETDATE());
GO 3

PRINT CHAR(10)+'Inline Version:'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @workdays INT;
  SELECT @workdays = f.workDays
  FROM   #dates AS d
  CROSS APPLY dbo.countWorkDays(d.StartDate,d.EndDate) AS f
PRINT DATEDIFF(MS,@st,GETDATE());
GO 3

结果:

Scalar Version (always serial):
------------------------------------------------------------
Beginning execution loop
380
363
350
Batch execution completed 3 times.

Inline Version:
------------------------------------------------------------
Beginning execution loop
47
47
46
Batch execution completed 3 times.

如您所见-内联版本比标量版本快8倍。将这些标量UDF替换为内联版本几乎可以肯定地加快了查询速度,无论联接类型如何。

我看到的其他问题包括:

  1. 我看到了很多索引扫描,这表明您需要更多的筛选和/或更好的索引。

  2. dbo.tblCrosswalkWghtPhnEffTarget没有任何索引,这意味着它将始终被扫描。

用于性能测试的功能:

-- INLINE VERSION
----------------------------------------------------------------------------------------------
IF OBJECT_ID('dbo.countWorkDays') IS NOT NULL DROP FUNCTION dbo.countWorkDays;
GO
CREATE FUNCTION dbo.countWorkDays (@startDate DATETIME, @endDate DATETIME) 
/*****************************************************************************************
[Purpose]:
 Calculates the number of business days between two dates (Mon-Fri) and excluded weekends.
 dates.countWorkDays does not take holidays into considerations; for this you would need a 
 seperate "holiday table" to perform an antijoin against.

 The idea is based on the solution in this article:
   https://www.sqlservercentral.com/Forums/Topic153606.aspx?PageIndex=16

[Author]:
 Alan Burstein

[Compatibility]:
 SQL Server 2005+

[Syntax]:
--===== Autonomous
 SELECT f.workDays
 FROM   dates.countWorkDays(@startdate, @enddate) AS f;

--===== Against a table using APPLY
 SELECT t.col1, t.col2, f.workDays
 FROM dbo.someTable t
 CROSS APPLY dates.countWorkDays(t.col1, t.col2) AS f;

[Parameters]:
  @startDate = datetime; first date to compare
  @endDate   = datetime; date to compare @startDate to

[Returns]:
 Inline Table Valued Function returns:
 workDays = int; number of work days between @startdate and @enddate

[Dependencies]:
 N/A

[Developer Notes]:
 1. NULL when either input parameter is NULL, 

 2. This function is what is referred to as an "inline" scalar UDF." Technically it's an
    inline table valued function (iTVF) but performs the same task as a scalar valued user
    defined function (UDF); the difference is that it requires the APPLY table operator
    to accept column values as a parameter. For more about "inline" scalar UDFs see this
    article by SQL MVP Jeff Moden: http://www.sqlservercentral.com/articles/T-SQL/91724/
    and for more about how to use APPLY see the this article by SQL MVP Paul White:
    http://www.sqlservercentral.com/articles/APPLY/69953/.

    Note the above syntax example and usage examples below to better understand how to
    use the function. Although the function is slightly more complicated to use than a
    scalar UDF it will yield notably better performance for many reasons. For example,
    unlike a scalar UDFs or multi-line table valued functions, the inline scalar UDF does
    not restrict the query optimizer's ability generate a parallel query execution plan.

 3. dates.countWorkDays requires that @enddate be equal to or later than @startDate. Otherwise
    a NULL is returned.

 4. dates.countWorkDays is NOT deterministic. For more deterministic functions see:
    https://msdn.microsoft.com/en-us/library/ms178091.aspx

[Examples]:
 --===== 1. Basic Use
 SELECT f.workDays 
 FROM   dates.countWorkDays('20180608', '20180611') AS f;

---------------------------------------------------------------------------------------
[Revision History]: 
 Rev 00 - 20180625 - Initial Creation - Alan Burstein
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT workDays =
    -- If @startDate or @endDate are NULL then rerturn a NULL
  CASE WHEN SIGN(DATEDIFF(dd, @startDate, @endDate)) > -1 THEN
                (DATEDIFF(dd, @startDate, @endDate) + 1) --total days including weekends
               -(DATEDIFF(wk, @startDate, @endDate) * 2) --Subtact 2 days for each full weekend
    -- Subtract 1 when startDate is Sunday and Substract 1 when endDate is Sunday: 
    -(CASE WHEN DATENAME(dw, @startDate) = 'Sunday'   THEN 1 ELSE 0 END)
    -(CASE WHEN DATENAME(dw, @endDate)   = 'Saturday' THEN 1 ELSE 0 END)
  END;
GO    

-- SCALAR VERSION
----------------------------------------------------------------------------------------------
IF OBJECT_ID('dbo.countWorkDays_scalar') IS NOT NULL DROP FUNCTION dbo.countWorkDays_scalar;
GO
CREATE FUNCTION dbo.countWorkDays_scalar (@startDate DATETIME, @endDate DATETIME) 
RETURNS INT WITH SCHEMABINDING AS
BEGIN
  RETURN
  (
    SELECT workDays =
        -- If @startDate or @endDate are NULL then rerturn a NULL
      CASE WHEN SIGN(DATEDIFF(dd, @startDate, @endDate)) > -1 THEN
                    (DATEDIFF(dd, @startDate, @endDate) + 1) --total days including weekends
                   -(DATEDIFF(wk, @startDate, @endDate) * 2) --Subtact 2 days for each full weekend
        -- Subtract 1 when startDate is Sunday and Substract 1 when endDate is Sunday: 
        -(CASE WHEN DATENAME(dw, @startDate) = 'Sunday'   THEN 1 ELSE 0 END)
        -(CASE WHEN DATENAME(dw, @endDate)   = 'Saturday' THEN 1 ELSE 0 END)
      END
  );
END
GO

基于OP在评论中的问题的更新:

首先获取每个函数的内联表值函数版本。请注意,我使用自己的表,没有时间使名称与您的环境匹配,但是我已尽力在代码中包含注释。还要注意,如果在您的函数中workingday = '1'只是在拉平日,那么您会发现上面的我的函数比dbo.fn_WorkDaysAge函数快得多。如果workingday = '1'也过滤掉假期,那么它将不起作用。

CREATE FUNCTION dbo.fn_WorkDaysAge_itvf
(
 @first_date  DATETIME,
 @second_date DATETIME
)
RETURNS TABLE AS RETURN
SELECT  WorkDays = COUNT(*)
FROM    dbo.dimdate -- DateDimension
WHERE   DateValue   -- [date]
BETWEEN @first_date AND @second_date
AND     IsWeekend = 0 --workingday = '1'
GO

CREATE FUNCTION dbo.fn_WorkDate15_itvf
(
 @TauStartDate DATETIME
)
RETURNS TABLE AS RETURN
WITH DATES AS 
(
  SELECT 
  ROW_NUMBER() OVER(Order By DateValue Desc) as RowNum, DateValue
  FROM dbo.dimdate -- DateDimension
  WHERE DateValue BETWEEN @TauStartDate AND --GETDATE() testing below 
   CASE WHEN GETDATE() < @TauStartDate + 200 THEN GETDATE() ELSE @TauStartDate + 200 END
  AND IsWeekend = 0 --workingday = '1'
)
--Get the 15th businessday from the current date
SELECT DateValue
FROM  DATES
WHERE RowNum = 16;
GO

现在,要用内联表值函数替换标量UDF,您可以这样做(请注意我的评论):

WITH agent_split_stats AS ( 
Select
    racf,
    agent_stats.SkillGroupSkillTargetID,
    aht_target.EnterpriseName,
    aht_target.target,
    Sum(agent_stats.CallsHandled) as n_calls_handled,
    CASE WHEN (Sum(agent_stats.TalkInTime) + Sum(agent_stats.IncomingCallsOnHoldTime) + Sum(agent_stats.WorkReadyTime)) = 0 THEN 1 ELSE
        (Sum(agent_stats.TalkInTime) + Sum(agent_stats.IncomingCallsOnHoldTime) + Sum(agent_stats.WorkReadyTime)) END
    AS total_handle_time
from tblAceyusAgntSklGrp as agent_stats
INNER JOIN tblCrosswalkWghtPhnEffTarget as aht_target
  ON aht_target.SgId = agent_stats.SkillGroupSkillTargetID
  AND agent_stats.DateTime BETWEEN aht_target.StartDt and aht_target.EndDt
INNER JOIN tblAgentMetricCrosswalk as xwalk
  ON xwalk.SkillTargetID = agent_stats.SkillTargetID
INNER JOIN tblTauClassList AS T
  ON T.SaRacf = racf
-- INLINE FUNCTIONS HERE:
CROSS APPLY dbo.fn_WorkDaysAge_itvf(TauStart, GETDATE()) AS wd
CROSS APPLY dbo.fn_WorkDate15_itvf(TauStart)             AS w15
-- NEW WHERE CLAUSE:
WHERE       agent_stats.DateTime >= 
              CASE WHEN wd.workdays < 15 THEN TauStart ELSE w15.workdays END
And Graduated = 'No'
AND CallsHandled <> 0
AND Target is not null
Group By
racf, agent_stats.SkillGroupSkillTargetID, aht_target.EnterpriseName, aht_target.target
),
agent_split_stats_with_weight AS (
SELECT 
    agent_split_stats.*,
    agent_split_stats.n_calls_handled/SUM(agent_split_stats.n_calls_handled) OVER(PARTITION BY agent_split_stats.racf) AS [weight]
FROM agent_split_stats
),
agent_split_effectiveness AS 
(
  SELECT 
      agent_split_stats_with_weight.*,
      (((agent_split_stats_with_weight.target * agent_split_stats_with_weight.n_calls_handled) / 
         agent_split_stats_with_weight.total_handle_time)*100)*
         agent_split_stats_with_weight.weight AS effectiveness_sum
  FROM agent_split_stats_with_weight
),
agent_effectiveness AS
(
  SELECT 
      racf AS SaRacf,
      ROUND(SUM(effectiveness_sum),2) AS WpeScore
  FROM agent_split_effectiveness
  GROUP BY racf
),
tau AS
(
  SELECT L.SaRacf, TauStart, Goal as WpeGoal 
  ,CASE WHEN agent_effectiveness.WpeScore IS NULL THEN 1 ELSE WpeScore END as WpeScore
  FROM tblTauClassList AS L
  LEFT JOIN agent_effectiveness
    ON agent_effectiveness.SaRacf = L.SaRacf
  LEFT JOIN tblCrosswalkTauGoal AS G
    ON  G.Year   = TauYear
    AND G.Bucket = 'Wpe'
  WHERE TermDate IS NULL
  AND   Graduated = 'No'
)
SELECT tau.*,
-- NEW CASE STATEMENT HERE: 
CASE WHEN wd.workdays > 14 AND WpeScore >= WpeGoal THEN 'Pass' ELSE 'Fail' END 
from tau
-- INLINE FUNCTIONS HERE:
CROSS APPLY dbo.fn_WorkDaysAge_itvf(TauStart, GETDATE()) AS wd
CROSS APPLY dbo.fn_WorkDate15_itvf(TauStart)             AS w15;

请注意,我目前无法测试,但它应该是正确的(或接近)

答案 1 :(得分:1)

更新

我接受了艾伦的回答,最后我做了以下工作。张贴示例,希望格式能对某人有所帮助,这使我有点慢...或者也许我只是慢了,呵呵。

1。将我的Scalar UDF更改为InlineTVF

SCALAR函数1-

    ALTER FUNCTION [dbo].[fn_WorkDaysAge]
(
    -- Add the parameters for the function here
    @first_date DATETIME,
    @second_date DATETIME
)
RETURNS int
AS
BEGIN
    -- Declare the return variable here
    DECLARE @WorkDays int

    -- Add the T-SQL statements to compute the return value here
SELECT @WorkDays = COUNT(*)
FROM DateDimension
WHERE Date BETWEEN @first_date AND @second_date
AND workingday = '1' 

    -- Return the result of the function
    RETURN @WorkDays

END

iTVF功能1-

    ALTER FUNCTION [dbo].[fn_iTVF_WorkDaysAge] 
(   
    -- Add the parameters for the function here
 @FirstDate as Date, 
 @SecondDate as Date
)
RETURNS TABLE  AS RETURN 

SELECT WorkDays = COUNT(*)
FROM DateDimension
WHERE Date BETWEEN @FirstDate AND @SecondDate
AND workingday = '1' 

然后我以相同的方式更新了我的下一个功能。我按如下所示添加了CROSS APPLY(我个人未使用过的东西,仍然是新手),并在我的case语句中用字段名称替换了UDF。

旧代码

INNER JOIN tblTauClassList AS T
  ON T.SaRacf = racf
WHERE
--FILTERS BY A ROLLING 15 BUSINESS DAYS UNLESS THE DAYS BETWEEN CURRENT DATE AND TAU START DATE ARE <15
agent_stats.DateTime >=
    CASE WHEN dbo.fn_WorkDaysAge(TauStart, GETDATE()) <15 THEN TauStart ELSE
        dbo.fn_WorkDate15(TauStart) 
    END

新代码

INNER JOIN tblTauClassList AS T
  ON T.SaRacf = racf
--iTVFs
CROSS APPLY dbo.fn_iTVF_WorkDaysAge(TauStart, GETDATE()) as age
CROSS APPLY dbo.fn_iTVF_WorkDate_15(TauStart) as roll
WHERE
--FILTERS BY A ROLLING 15 BUSINESS DAYS UNLESS THE DAYS BETWEEN CURRENT DATE AND TAU START DATE ARE <15
agent_stats.DateTime >=
    CASE WHEN age.WorkDays <15 THEN TauStart ELSE
        roll.Date 
    END

新代码将在3-4秒内运行。我将回过头来按照您的建议为适当的表建立索引,并可能在那里获得更高的效率。

太谢谢你了!