我有两个包含空间数据的数据集。
数据集1有大约15,000,000条记录。 数据集2有大约16,000,000条记录。
两者都使用数据类型地理(GPS坐标),所有记录都是点。
两个表都有空格索引,其中cells_per_object = 1,级别为(HIGH,HIGH,HIGH,HIGH)
所有点都位于全球范围内的小区域(美国1个州)。这些点足够分散以保证使用地理而不是几何投影。
DECLARE @g GEOGRAPHY
SET @g = (SELECT TOP 1 GPSPoint FROM Dataset1)
EXEC sp_help_spatial_geography_index 'Dataset1', 'Dataset1_SpatialIndex', 0, @g
显示
propvalue-propname
1-Total_Number_Of_ObjectCells_In_Level0_For_QuerySample
28178-Total_Number_Of_ObjectCells_In_Level1_In_Index
1-Total_Number_Of_ObjectCells_In_Level4_For_QuerySample
14923330-Total_Number_Of_ObjectCells_In_Level4_In_Index
1-Total_Number_Of_Intersecting_ObjectCells_In_Level1_In_Index
1-Total_Number_Of_Intersecting_ObjectCells_In_Level4_For_QuerySample
14923330-Total_Number_Of_Intersecting_ObjectCells_In_Level4_In_Index
1-Total_Number_Of_Border_ObjectCells_In_Level0_For_QuerySample
28177-Total_Number_Of_Border_ObjectCells_In_Level1_In_Index
740-Number_Of_Rows_Selected_By_Primary_Filter
0-Number_Of_Rows_Selected_By_Internal_Filter
740-Number_Of_Times_Secondary_Filter_Is_Called
1-Number_Of_Rows_Output
99.99504-Percentage_Of_Rows_NotSelected_By_Primary_Filter
0-Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter
0-Internal_Filter_Efficiency
0.135135-Primary_Filter_Efficiency
这意味着查询
DECLARE @g GEOGRPAHY
SET @g = (SELECT TOP 1 GPSPoint FROM Dataset1)
SELECT TOP 1
*
FROM
Dataset2 D
WHERE
@g.Filter(D.GPSPoint.STBuffer(1)) = 1
需要将近一个小时才能完成。
我也尝试过做
WITH TABLE1 AS (
SELECT
A.RecordID,
B.RecordID,
RANK() OVER (PARTITION BY A.RecordID ORDER BY A.GPSPoint.STDistance(B.GPSPoint) ASC) AS 'Ranking'
FROM
Dataset1 A
INNER JOIN
Dataset2 B
ON
B.GPSPoint.Filter(A.GPSPoint.STBuffer(1)) = 1
AND A.GPSPoint.STDistance(B.GPSPoint) <= 50
)
SELECT
*
FROM
TABLE1
WHERE
Ranking = 1
最终速度提高了大约1,000倍,但按照这个速度,我要做的就是查询运行六个月才能完成。老实说,我不知道该做什么。最终目标是对dataset1中的每条记录进行最近邻搜索,以找到dataset2中的最近点,但是这样看似乎是不可能的。
有没有人有什么想法可以提高这个过程的效率?
答案 0 :(得分:0)
试试这个:它基于MSDN的建议。
SELECT TOP(1)
A.RecordID,
B.RecordID,
A.GPSPoint.STDistance(B.GPSPoint) AS Distance
FROM
Dataset1 A
INNER JOIN
Dataset2 B
ON
A.GPSPoint.STDistance(B.GPSPoint) <= 50
AND B.GPSPoint IS NOT NULL
ORDER BY BY A.GPSPoint.STDistance(B.GPSPoint) ASC
注意我已经删除了它,首先尝试上面的查询,然后添加这些谓词并查看它如何影响索引。
B.GPSPoint.Filter(A.GPSPoint.STBuffer(1)) = 1
AND
//or try B.GPSPoint.STIntersects(A.GPSPoint.STBuffer(1)) = 1
要使用空间索引的最近邻查询,必须满足以下要求: