Question

我不是真正的大型TSSQL专家。我们有一个返回近200万条记录的查询。运行需要2.5分钟。我们已经添加了索引，并且削减了一分钟（过去需要3.5分钟才能运行），但2.5分钟仍然太慢。谁能告诉我如何修改此查询以提高其性能？我怀疑在“in”语句中会发生更改，并且计算“LOCATION”值的方式，但我不确定如何去做。

select distinct
A.ptid,
A.ptmgrx,
A.ptmgry,
A.fename,
B.fename XFENAME,
A.rt1 route,
A.pr,
A.mp,
A.rdbranch,
A.gs grade,
A.cs,
A.csmp,
A.cspath,
(select distinct twpname from tamc.dbo.fipsmdot where fipscode = A.fmcdl) + '; ' +
(select distinct county from tamc.dbo.fipsmdot where fipsco = A.countyl) + ' County, ' +
(select distinct twpname from tamc.dbo.fipsmdot where fipscode = A.fmcdr) + '; ' +
(select distinct county from tamc.dbo.fipsmdot where fipsco = A.countyr) + ' County' LOCATION
from intersectionApproaches A INNER JOIN intersectionApproaches B ON A.ptid = B.ptid
where A.ptid in 
(select distinct C.ptid from intersectionApproaches C, intersectionApproaches B
where C.ptid = B.ptid)

编辑：数据库服务器是MS SQL Server 2008。我有执行计划，但似乎没有办法让它很容易看到。它将自己保存为.sqlplan文件。不确定这是否适用于其他任何人。但我可以从中提供一些信息。

29％的时间用于进行不同的排序。

15％的时间用于做“索引假脱机”（它做了三次不同的时间）

花费6％的时间来做“桌子假脱机”（只有一次）

8％做“排序”

回答JNK：原始查询位于存储过程中，并采用了一些限制结果集的参数。以下是包含这些参数的原始查询：

select distinct
A.ptid,
A.ptmgrx,
A.ptmgry,
A.fename,
B.fename XFENAME,
A.rt1 route,
A.pr,
A.mp,
A.rdbranch,
A.gs grade,
A.cs,
A.csmp,
A.cspath,
(select distinct twpname from tamc.dbo.fipsmdot where fipscode = A.fmcdl) + '; ' +
(select distinct county from tamc.dbo.fipsmdot where fipsco = A.countyl) + ' County, ' +
(select distinct twpname from tamc.dbo.fipsmdot where fipscode = A.fmcdr) + '; ' +
(select distinct county from tamc.dbo.fipsmdot where fipsco = A.countyr) + ' County' LOCATION
from intersectionApproaches A INNER JOIN intersectionApproaches B ON A.ptid = B.ptid
where A.ptid in 
(select distinct C.ptid from intersectionApproaches C, intersectionApproaches B
where C.ptid = B.ptid
and (C.fename + isnull(' ' + C.fetype,'') like @str + '%' or @str in (C.rt1name, C.rt2name, C.rt3name)) and (B.fename + isnull(' ' + B.fetype,'') like @xstr + '%' or @xstr in (B.rt1name, B.rt2name, B.rt3name)))

希望这有助于解释发生的事情。我删除了where子句的最后一部分，因为我试图获得未被这些参数过滤的结果集。我希望将其移到视图中。

更多编辑：表格“IntersectionApproaches”是道路交叉口的方法表。例如，标准道路交叉口由4个方法组成（交叉路口的每一侧可以接近交叉路口）。存储过程的最初目的是返回与特定街道名称对匹配的intersectionApproaches列表。

例如，你有“Main”和“1st”，他们有一个交集。

这一个十字路口有四种交叉路口。

“主要和第一”

“1st and Main”

“第1和第1”

“主要和主要”

它们共享相同的“PTID”，即交叉ID。

但是，IntersectionApproaches表只有一个字段用于街道名称“FENAME”。为了使记录完整，我们需要相应的跨街道的名称（因此查询中的“B.Fename XFENAME”）。这允许我们查询结果并说“让我所有的记录都有'Main'街和'1st'的街道。”此外，我们还需要填充“LOCATION”值，该值由查询中的多个“select distinct”语句定义，因为我们也可以对此进行过滤。

我试图将其设置为视图，而不是首先提供交叉街道以获得可管理的记录集，然后再在“位置”字段上进行过滤。当前的存储过程也没有帮助我进行分页或排序。我只想把它全部放到一个视图中，这样我就可以更自然地使用它了。

希望这会有所帮助......

Answer 1

我在您的代码中计算了六次DISTINCT次。那太多了，太多了。

甚至没有看到您的数据，最后DISTINCT子句中的IN完全不需要。 IN并不关心子查询中是否存在欺骗，因为它会短路。

您的子查询看起来也可以全部由JOIN替换。

现在您正在从主表中查询fipsmdot每行五次。

您还可以在主表（自联接）中看到完全不必要的INNER JOIN。

要获得更多详细信息，您需要发布一些示例数据，表格结构以及您想要获得的内容。

Answer 2

三四件事情跳出来：

~~使用IN的奇怪。这可以简化，因为A.ptid = B.ptid = C.ptid~~
每行子查询：更改为JOIN
为什么在密钥上加入两次交叉方法？这会强制不必要的外部DISTINCT
（感谢Bill）无意义的WHERE

尝试这样来解决一些问题

select distinct
  A.ptid,
  A.ptmgrx,
  A.ptmgry,
  A.fename,
  B.fename XFENAME,
  A.rt1 route,
  A.pr,
  A.mp,
  A.rdbranch,
  A.gs grade,
  A.cs,
  A.csmp,
  A.cspath,
  a1.twpname '; ' +
  a2.twpname + ' County, ' +
  a3.twpname  + '; ' +
  a4.twpname  + ' County' LOCATION
from
  intersectionApproaches A
  INNER JOIN
  intersectionApproaches B ON A.ptid = B.ptid
  JOIN
  (select distinct twpname from tamc.dbo.fipsmdot) a1 ON a1.fipscode = A.fmcdl
  JOIN
  (select distinct twpname from tamc.dbo.fipsmdot) a2 ON a2.fipsco = A.countyl
  JOIN
  (select distinct twpname from tamc.dbo.fipsmdot) a3 ON a3.fipscode = A.fmcdr
  JOIN
  (select distinct twpname from tamc.dbo.fipsmdot) a4 ON a4.fipsco = A.countyr
/*
 not needed as Bill said. Always true
where
  EXISTS (SELECT *
      FROM intersectionApproaches C
      WHERE C.ptid = A.ptid)
*/

Answer 3

我在这里发布这个作为答案，以便其他展望未来的人会在需要时看到这一点。

我能够使用@gbn中的信息创建一个查询，该查询为我提供了所需的信息，而没有嵌套的选择。

以下是新查询：

    select 
DISTINCT
A.ptid,
  A.ptmgrx,
  A.ptmgry,
  A.fename,
  B.fename XFENAME,
  A.rt1 route,
  A.pr,
  A.mp,
  A.rdbranch,
  A.gs grade,
  A.cs,
  A.csmp,
  A.cspath,
  c1.twpname  + '; ' +
  c2.county + ' County, ' +
  c3.twpname + '; ' +
  c4.county + ' County' LOCATION,
  A.rt1name,
  A.rt2name,
  A.rt3name,
  A.fetype,
  B.rt1name AS Xrt1name,
  B.rt2name AS Xrt2name,
  B.rt3name AS Xrt3name,
  B.fetype AS Xfetype
  from dbo.intersectionApproaches A 
  INNER JOIN intersectionApproaches B ON A.ptid = B.ptid
  JOIN
  (select distinct twpname, fipscode from tamc.dbo.fipsmdot) c1 ON c1.fipscode = A.fmcdl
  JOIN
  (select distinct county, fipsco from tamc.dbo.fipsmdot) c2 ON c2.fipsco = A.countyl
    JOIN
  (select distinct twpname, fipscode from tamc.dbo.fipsmdot) c3 ON c3.fipscode = A.fmcdr
  JOIN
  (select distinct county, fipsco from tamc.dbo.fipsmdot) c4 ON c4.fipsco = A.countyr

运行还需要一段时间，但我想这是我能做的最好的事情。只要我在像ptid或fename之类的东西上过滤结果集，性能就可以接受（2-3秒）。在没有任何过滤器的情况下运行整个过程导致查询仍然需要超过2.5分钟才能运行，但我认为我们不会经常以这种方式使用它（如果有的话）。

感谢大家的投入和时间。我认为到目前为止所有的答案都让我能够想出这种改进。

寻找一些帮助提高此SQL查询的性能

3 个答案: