我有一个包含大约500个点的表,我正在寻找容差范围内的重复项。这需要不到一秒钟,给我500行。大多数的距离为零,因为它给出了相同的点(PointA = PointB)
DECLARE @TOL AS REAL
SET @TOL = 0.05
SELECT
PointA.ObjectId as ObjectIDa,
PointA.Name as PTNameA,
PointA.[Description] as PTdescA,
PointB.ObjectId as ObjectIDb,
PointB.Name as PTNameB,
PointB.[Description] as PTdescB,
ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
FROM CadData.Survey.SurveyPoint PointA
JOIN [CadData].Survey.SurveyPoint PointB
ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL
-- AND
-- PointA.ObjectId <> PointB.ObjectID
ORDER BY ObjectIDa
如果我使用靠近底部的注释掉的行,我会得到14行,但执行时间最多可达14秒。直到我的积分表扩大到成千上万之后,才算达成协议。
如果答案已经在那里,我会提前道歉。我确实看过了,但是当我成为新人时,我会迷失阅读那些超出我头脑的帖子。ADDENDUM:ObjectID是bigint和表的PK,所以我意识到我可以将语句更改为
AND PointA.ObjectID > PointB.ObjectID
这需要一半的时间并给我一半的结果(7秒内7行)。我现在不会重复(因为在第4点接近第8点,然后第8点接近第4点)。然而,由于表格非常大,性能仍然令我担忧,因此任何性能问题都将成为问题。
ADDENDUM 2:如下更改JOIN和AND(或建议的WHERE)的顺序也没有区别。
DECLARE @TOL AS REAL
SET @TOL = 0.05
SELECT
PointA.ObjectId as ObjectIDa,
PointA.Name as PTNameA,
PointA.[Description] as PTdescA,
PointB.ObjectId as ObjectIDb,
PointB.Name as PTNameB,
PointB.[Description] as PTdescB,
ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
FROM CadData.Survey.SurveyPoint PointA
JOIN [CadData].Survey.SurveyPoint PointB
ON PointA.ObjectId < PointB.ObjectID
WHERE
PointA.Geometry.STDistance(PointB.Geometry) < @TOL
ORDER BY ObjectIDa
我觉得很有意思的是,我可以将@Tol值更改为大的,返回超过100行而性能没有变化,即使它需要很多计算。但随后添加一个简单的A.
答案 0 :(得分:2)
当您添加ObjectID
比较时,执行计划可能在幕后做一些事情。检查执行计划以查看查询的两个不同版本是否是,例如,使用索引查找与表扫描。如果是,请考虑尝试使用query hints。
作为一种解决方法,您始终可以使用子查询:
DECLARE @TOL AS REAL
SET @TOL = 0.05
SELECT
ObjectIDa,
PTNameA,
PTdescA,
ObjectIDb,
PTNameB,
PTdescB,
DIST
FROM
(
SELECT
PointA.ObjectId as ObjectIDa,
PointA.Name as PTNameA,
PointA.[Description] as PTdescA,
PointB.ObjectId as ObjectIDb,
PointB.Name as PTNameB,
PointB.[Description] as PTdescB,
ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
FROM CadData.Survey.SurveyPoint PointA
JOIN [CadData].Survey.SurveyPoint PointB
ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL
-- AND
-- PointA.ObjectId <> PointB.ObjectID
) Subquery
WHERE ObjectIDa <> ObjectIDb
ORDER BY ObjectIDa
答案 1 :(得分:2)
这是一个有趣的问题。
通过更改“&lt;&gt;”来获得较大的性能提升并非不切实际到“&gt;”。
正如其他人所提到的,诀窍是从索引中获得最大收益。当然,通过使用“&gt;”,你应该很容易让服务器限制你的PK上的特定范围 - 当你已经检查过“前进”时,避免“向后”。
此改进将扩展 - 在添加行时会有所帮助。但是你担心它无助于防止工作增加。正如您正确的想法,只要您必须扫描更多行,就需要更长时间。这就是这种情况,因为我们总是希望比较一切。
如果第一部分看起来不错,只需TOL检查,您是否考虑完全拆分第二部分?
将第一部分更改为转储到临时表中
SELECT
PointA.ObjectId as ObjectIDa,
PointA.Name as PTNameA,
PointA.[Description] as PTdescA,
PointB.ObjectId as ObjectIDb,
PointB.Name as PTNameB,
PointB.[Description] as PTdescB,
ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
into #AllDuplicatesWithRepeats
FROM CadData.Survey.SurveyPoint PointA
JOIN [CadData].Survey.SurveyPoint PointB
ON
PointA.Geometry.STDistance(PointB.Geometry) < @TOL
ORDER BY ObjectIDa
他们可以在下面编写跳过重复项的直接查询。它并不特别,但是对于临时表中的那个小集合,它应该非常快速。
Select
*
from
#AllDuplicatesWithRepeats d1
left join #AllDuplicatesWithRepeats d2 on (
d1.objectIDa = d2.objectIDb
and
d1.objectIDb = d2.objectIDa
)
where
d2.objectIDb is null
答案 2 :(得分:1)
尝试在PointA.ObjectId <> PointB.ObjectID
和WHERE
子句之间使用JOIN
ORDER BY
子句。
像这样:
DECLARE @TOL AS REAL
SET @TOL = 0.05
SELECT
PointA.ObjectId as ObjectIDa,
PointA.Name as PTNameA,
PointA.[Description] as PTdescA,
PointB.ObjectId as ObjectIDb,
PointB.Name as PTNameB,
PointB.[Description] as PTdescB,
ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
FROM CadData.Survey.SurveyPoint PointA
JOIN [CadData].Survey.SurveyPoint PointB
ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL
WHERE PointA.ObjectId <> PointB.ObjectID
ORDER BY ObjectIDa
答案 3 :(得分:1)
对于@Mike_M的赞誉,这里是编辑后的选择,它在2秒内运行。
SELECT
PointA.ObjectId as ObjectIDa,
PointA.Name as PTNameA,
PointA.[Description] as PTdescA,
PointB.ObjectId as ObjectIDb,
PointB.Name as PTNameB,
PointB.[Description] as PTdescB,
ROUND(PointA.Geometry.STDistance(PointB.Geometry),3) DIST
into #AllDuplicatesWithRepeats
FROM CadData.Survey.SurveyPoint PointA
JOIN [CadData].Survey.SurveyPoint PointB
ON PointA.Geometry.STDistance(PointB.Geometry) < @TOL
ORDER BY ObjectIDa
Select
*
from
#AllDuplicatesWithRepeats d1
Where
d1.ObjectIDa < d1.ObjectIDb