用一定的时间来处理SQL问题

时间:2016-04-22 23:55:42

标签: sql sql-server query-optimization

      SELECT
 s.ColID1
,s.ColIdentification2
,s.StatusColumn
,(SELECT
     MAX(pd.DateColumn)
   FROM DocumentTable pd
   WHERE pd.IsPresent = 1
   AND pd.ColIdentification2 = s.ColIdentification2
   AND pd.TypeofFile = 'TextFiles')
 AS maxDate
,(SELECT TOP 1
     u.Title
   FROM DocumentTable pd
   LEFT OUTER JOIN [User] u
     ON u.UserId = pd.UserId
   WHERE pd.IsPresent = 1
   AND pd.ColIdentification2 = s.ColIdentification2
   AND pd.TypeofFile = 'Text Files'
   ORDER BY pd.DateColumn DESC)
 AS Name1
 ,(SELECT TOP 1
     pd.DocumentType
   FROM DocumentTable pd
   WHERE pd.IsPresent = 1
   AND pd.ColIdentification2 = s.ColIdentification2
   AND pd.TypeofFile = 'Text Files'
   ORDER BY pd.DateColumn DESC)
, (SELECT TOP 1
     pd.TypeofFile
   FROM DocumentTable pd
   WHERE pd.IsPresent = 1
   AND pd.ColIdentification2 = s.ColIdentification2
   AND pd.TypeofFile = 'Text Files'
   ORDER BY pd.DateColumn DESC)
 ,(SELECT TOP 1
    pd.Region
    FROM DocumentTable pd
   WHERE pd.IsPresent = 1
   AND pd.ColIdentification2 = s.ColIdentification2
   AND pd.TypeofFile = 'Text Files'
   ORDER BY pd.DateColumn DESC)
 ,(SELECT TOP 1
    pd.Agency 
    FROM DocumentTable pd
   WHERE pd.IsPresent = 1
   AND pd.ColIdentification2 = s.ColIdentification2
   AND pd.TypeofFile = 'Text Files'
   ORDER BY pd.DateColumn DESC)
FROM Service s (NOLOCK)
--left outer join DocumentTable pd1 (NOLOCK)
--on pd1.ColIdentification2 = s.ColIdentification2
WHERE s.IsPresent = 1
--AND pd1.ColIdentification2 = s.ColIdentification2
AND s.StatusColumn IN ('Val1', 'Val3')
AND NOT EXISTS (SELECT
   pd.DocumentTableId
 FROM DocumentTable pd
 WHERE pd.IsPresent = 1
 AND pd.ColIdentification2 = s.ColIdentification2
 AND pd.TypeofFile IN ('DC1', 'DC2'))
AND NOT EXISTS (SELECT
   utds.ID
 FROM  utds
 WHERE utds.Service_x0020_ID1_Id = s.ColID1
 AND utds.Type IN ('DC1', 'DC2'))
ORDER BY s.ColID1

我正在尝试优化这个sql。由于许多子查询,它需要很长时间。此查询运行时间超过10分钟,我正在努力改进它。无论如何要避免子查询。我尝试在表之间使用Left Outer join,但我认为由于DocumentTable中ColID1的数据重复,我没有得到正确的数据

2 个答案:

答案 0 :(得分:0)

很难调整没有统计数据和执行计划的查询,并尝试和错误。

我认为,您可以通过将子查询转换为加入来使其更好。因此,尝试消除子查询。

您可以使用以下查询删除4个联接

SELECT s.ColID1
    , s.ColIdentification2
    , s.StatusColumn
    , pd.DocumentType, pd.TypeofFile, pd.Region, pd.TypeofFile, Region
from [Service] s 
    outer apply (select top 1 DocumentType, TypeofFile, Region, TypeofFile, Region
                from DocumentTable
                where IsPresent = 1 and TypeofFile = 'Text Files' 
                    and ColIdentification2 = s.ColIdentification2
                order by DateColumn desc) pd

如果有帮助,请尝试使用相同的方法。

还要确保两个表中的ColIdentification2字段都有索引。

答案 1 :(得分:0)

Flicker非常重视确保您的公共列(如ColIdentification2)被编入索引。我还想验证您在DocumentTable.DateColumn上有索引。

无论如何......

在你的查询中,事情有点忙,让我们重新格式化一下并拍摄一张大图片"看看它:

SELECT
 s.ColID1
,s.ColIdentification2
,s.StatusColumn
,(SELECT TOP 1 u.Title         FROM DocumentTable pd LEFT OUTER JOIN [User] u ON u.UserId = pd.UserId WHERE pd.IsPresent = 1 AND pd.ColIdentification2 = s.ColIdentification2 AND pd.TypeofFile = 'Text Files' ORDER BY pd.DateColumn DESC) AS Name1
,(SELECT MAX(pd.DateColumn)    FROM DocumentTable pd WHERE pd.IsPresent = 1 AND pd.ColIdentification2 = s.ColIdentification2 AND pd.TypeofFile = 'TextFiles') AS maxDate
,(SELECT TOP 1 pd.DocumentType FROM DocumentTable pd WHERE pd.IsPresent = 1 AND pd.ColIdentification2 = s.ColIdentification2 AND pd.TypeofFile = 'Text Files' ORDER BY pd.DateColumn DESC)
,(SELECT TOP 1 pd.TypeofFile   FROM DocumentTable pd WHERE pd.IsPresent = 1 AND pd.ColIdentification2 = s.ColIdentification2 AND pd.TypeofFile = 'Text Files' ORDER BY pd.DateColumn DESC)
,(SELECT TOP 1 pd.Region       FROM DocumentTable pd WHERE pd.IsPresent = 1 AND pd.ColIdentification2 = s.ColIdentification2 AND pd.TypeofFile = 'Text Files' ORDER BY pd.DateColumn DESC)
,(SELECT TOP 1 pd.Agency       FROM DocumentTable pd WHERE pd.IsPresent = 1 AND pd.ColIdentification2 = s.ColIdentification2 AND pd.TypeofFile = 'Text Files' ORDER BY pd.DateColumn DESC)
FROM Service s (NOLOCK)
WHERE s.IsPresent = 1
  AND s.StatusColumn IN ('Val1', 'Val3')
AND NOT EXISTS (SELECT utds.ID FROM  utds WHERE utds.Service_x0020_ID1_Id = s.ColID1 AND utds.Type IN ('DC1', 'DC2'))
ORDER BY s.ColID1

因此,以下列看起来最终都来自DocumentTable pd中的SAME行:

pd.DateColumn
pd.DocumentType 
pd.TypeofFile   
pd.Region       
pd.Agency       

note: For pd.DateColumn, your use of max(pd.DateColumn) has the result same
      the sub-select style you're using in the other pd.* columns:
      SELECT TOP 1 pd.DateColumn from ...BLAH BLAH BLAH... order by pd.DateColumn DESC
Also your pd.DateColumn's subselect has a where clause checking for 'TextFiles'
instead of 'Text Files' that the other pd.* columns are using, should they all
be 'Text Files'?  (Note the extra embedded space in 'TextFiles' vs 'Text Files')

而不是为pd运行相同的子查询逻辑5次, 让我们将它推入左连接并尝试一次...

这是完全未经测试的代码btw,我希望它有效: - )

SELECT
  s.ColID1
, s.ColIdentification2
, s.StatusColumn
/* If we get a stable row for PD pulling u.Title from User becomes easier... */
, (select u.Title from User u where on u.UserId = pd.UserId) as userTitle
, pd.DateColumn
, pd.DocumentType
, pd.TypeofFile
, pd.Region
, pd.Agency
FROM Service s (NOLOCK)
left join DocumentTable pd
       on  pd.IsPresent = 1 
       and pd.ColIdentification2 = s.ColIdentification2
       and pd.TypeofFile = 'Text Files'
       /* This next condition avoids having to do the ORDER BY pd.DateColumnDESC 
        * The idea is for sqlserver to consider all potential matching pd records
        * but ignore any that aren't the largest date.
        */
       and not exists( select 1 from DocumentTable pd2
                       where pd2.IsPresent          = pd1.IsPresent
                         and pd2.ColIdentification2 = pd.ColIdentification2
                         and pd2.TypeofFile         = pd.TypeofFile
                         and pd2.DateColumn         > pd.DateColumn)
       /* may as well add the "no DC1 & DC2" clause here... */
       and not exists (select 1 FROM DocumentTable pd3
                       where pd2.IsPresent          = pd1.IsPresent
                         and pd2.ColIdentification2 = pd.ColIdentification2
                         and pd2.TypeofFile         in ( 'DC1', 'DC2')
                         and pd2.DateColumn         > pd.DateColumn)
WHERE s.IsPresent = 1
  AND s.StatusColumn IN ('Val1', 'Val3')
  AND NOT EXISTS (
     SELECT 1 FROM  utds
     WHERE utds.Service_x0020_ID1_Id = s.ColID1
       AND utds.Type                 IN ('DC1', 'DC2') )
ORDER BY s.ColID1

一些结束的想法:

我喜欢缩进复杂的WHERE条款,让我更容易缠头 围绕逻辑。

要考虑查询的行为,请使用主表'正在做:

select * FROM Service s

对于我们从“'我们想找到(最多)一个合适的' pd'记录。

这里"合适的"表示pd.ColIdentification2 = s.colIdentification之类的常见列,等等。

细微之处在于:

AND NOT EXISTS (SELECT 1 FROM DocumentTable PD2 ....WHERE PD2.DATECOLUMN > PD.DATECOLUMN).

这里的一个加速优势是我们真的不关心ORDER BY,我们只是想确保我们在pd中有最新的行(我们使用not-exists用pd2将任何旧的pd记录从运行中踢出来。

我认为这比ORDER BY更快的原因是SQL Server引擎不需要进行索引遍历来处理TOP 1上的ORDER BY DATECOLUMN DESC& #34 ;;一个聪明的优化器可能会想出来并且只是跳到DATECOLUMN的最大索引......但这是一个很大的可能所以我希望这种方法总体上更快。)

你会注意到一个类似的技巧,阻止阻止任何有DC1或DC2的PD记录。

在原始查询中,我将该部分(最后,在主WHERE子句中)读取为:"即使给定的PD记录在各方面都是完美的(完全匹配' s&#39) ;并且是最新的PD记录),如果任何PD / S匹配存在' DC1'或者' DC2' (无论日期如何)然后我们想要发出所有PD / S记录。