在JOIN之后添加简单的AND会导致性能下降

时间:2013-12-30 04:48:27

标签: sql sql-server geospatial spatial-query

我有一个包含大约500个点的表,我正在寻找容差范围内的重复项。这需要不到一秒钟,给我500行。大多数的距离为零,因为它给出了相同的点(PointA = PointB)

DECLARE @TOL AS REAL
SET @TOL = 0.05

SELECT 
    PointA.ObjectId as ObjectIDa,
    PointA.Name as PTNameA,
    PointA.[Description] as PTdescA,
    PointB.ObjectId as ObjectIDb,
    PointB.Name as PTNameB,
    PointB.[Description] as PTdescB,
    ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
FROM CadData.Survey.SurveyPoint PointA
  JOIN [CadData].Survey.SurveyPoint PointB
    ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL
   -- AND
   -- PointA.ObjectId <> PointB.ObjectID
ORDER BY ObjectIDa

如果我使用靠近底部的注释掉的行,我会得到14行,但执行时间最多可达14秒。直到我的积分表扩大到成千上万之后,才算达成协议。

如果答案已经在那里,我会提前道歉。我确实看过了,但是当我成为新人时,我会迷失阅读那些超出我头脑的帖子。

ADDENDUM:ObjectID是bigint和表的PK,所以我意识到我可以将语句更改为

AND PointA.ObjectID > PointB.ObjectID

这需要一半的时间并给我一半的结果(7秒内7行)。我现在不会重复(因为在第4点接近第8点,然后第8点接近第4点)。然而,由于表格非常大,性能仍然令我担忧,因此任何性能问题都将成为问题。

ADDENDUM 2:如下更改JOIN和AND(或建议的WHERE)的顺序也没有区别。

DECLARE @TOL AS REAL
SET @TOL = 0.05

SELECT 
    PointA.ObjectId as ObjectIDa,
    PointA.Name as PTNameA,
    PointA.[Description] as PTdescA,
    PointB.ObjectId as ObjectIDb,
    PointB.Name as PTNameB,
    PointB.[Description] as PTdescB,
    ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
FROM CadData.Survey.SurveyPoint PointA
  JOIN [CadData].Survey.SurveyPoint PointB
    ON PointA.ObjectId < PointB.ObjectID
    WHERE
    PointA.Geometry.STDistance(PointB.Geometry) < @TOL
ORDER BY ObjectIDa

我觉得很有意思的是,我可以将@Tol值更改为大的,返回超过100行而性能没有变化,即使它需要很多计算。但随后添加一个简单的A.

4 个答案:

答案 0 :(得分:2)

当您添加ObjectID比较时,执行计划可能在幕后做一些事情。检查执行计划以查看查询的两个不同版本是否是,例如,使用索引查找与表扫描。如果是,请考虑尝试使用query hints

作为一种解决方法,您始终可以使用子查询:

DECLARE @TOL AS REAL
SET @TOL = 0.05

SELECT 
    ObjectIDa,
    PTNameA,
    PTdescA,
    ObjectIDb,
    PTNameB,
    PTdescB,
    DIST
FROM
(
SELECT 
  PointA.ObjectId as ObjectIDa,
    PointA.Name as PTNameA,
    PointA.[Description] as PTdescA,
    PointB.ObjectId as ObjectIDb,
    PointB.Name as PTNameB,
    PointB.[Description] as PTdescB,
    ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
FROM CadData.Survey.SurveyPoint PointA
  JOIN [CadData].Survey.SurveyPoint PointB
    ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL
   -- AND
   -- PointA.ObjectId <> PointB.ObjectID
) Subquery
WHERE ObjectIDa <> ObjectIDb
ORDER BY ObjectIDa

答案 1 :(得分:2)

这是一个有趣的问题。

通过更改“&lt;&gt;”来获得较大的性能提升并非不切实际到“&gt;”。

正如其他人所提到的,诀窍是从索引中获得最大收益。当然,通过使用“&gt;”,你应该很容易让服务器限制你的PK上的特定范围 - 当你已经检查过“前进”时,避免“向后”。

此改进将扩展 - 在添加行时会有所帮助。但是你担心它无助于防止工作增加。正如您正确的想法,只要您必须扫描更多行,就需要更长时间。这就是这种情况,因为我们总是希望比较一切。

如果第一部分看起来不错,只需TOL检查,您是否考虑完全拆分第二部分?

将第一部分更改为转储到临时表中

SELECT 
    PointA.ObjectId as ObjectIDa,
    PointA.Name as PTNameA,
    PointA.[Description] as PTdescA,
    PointB.ObjectId as ObjectIDb,
    PointB.Name as PTNameB,
    PointB.[Description] as PTdescB,
    ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST

into #AllDuplicatesWithRepeats

FROM CadData.Survey.SurveyPoint PointA
  JOIN [CadData].Survey.SurveyPoint PointB
    ON 
    PointA.Geometry.STDistance(PointB.Geometry) < @TOL
ORDER BY ObjectIDa

他们可以在下面编写跳过重复项的直接查询。它并不特别,但是对于临时表中的那个小集合,它应该非常快速。

Select
    *
from    
    #AllDuplicatesWithRepeats d1
        left join #AllDuplicatesWithRepeats d2 on (
                        d1.objectIDa = d2.objectIDb
                        and
                        d1.objectIDb = d2.objectIDa
                        )
where
    d2.objectIDb is null

答案 2 :(得分:1)

尝试在PointA.ObjectId <> PointB.ObjectIDWHERE子句之间使用JOIN ORDER BY子句。

像这样:

DECLARE @TOL AS REAL
SET @TOL = 0.05

SELECT 
    PointA.ObjectId as ObjectIDa,
    PointA.Name as PTNameA,
    PointA.[Description] as PTdescA,
    PointB.ObjectId as ObjectIDb,
    PointB.Name as PTNameB,
    PointB.[Description] as PTdescB,
    ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
FROM CadData.Survey.SurveyPoint PointA
  JOIN [CadData].Survey.SurveyPoint PointB
    ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL
WHERE PointA.ObjectId <> PointB.ObjectID
ORDER BY ObjectIDa

答案 3 :(得分:1)

对于@Mike_M的赞誉,这里是编辑后的选择,它在2秒内运行。

SELECT 
    PointA.ObjectId as ObjectIDa,
    PointA.Name as PTNameA,
    PointA.[Description] as PTdescA,
    PointB.ObjectId as ObjectIDb,
    PointB.Name as PTNameB,
    PointB.[Description] as PTdescB,
    ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST

into #AllDuplicatesWithRepeats

FROM CadData.Survey.SurveyPoint PointA
  JOIN [CadData].Survey.SurveyPoint PointB
    ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL  
ORDER BY ObjectIDa

Select
    *
from    
    #AllDuplicatesWithRepeats d1
Where
    d1.ObjectIDa < d1.ObjectIDb