Question

我有2张桌子

USER包含一个UserID和两个邮政编码（约100万条记录）

+--------+----------+----------+
| UserID | Zipcode1 | Zipcode2 |
+--------+----------+----------+
|      1 |    08003 |    10016 |
|      2 |    11780 |    48073 |
|      3 |    57106 |    33487 |
+--------+----------+----------+

LOCATION包含LocationID和邮政编码（约1000条记录）

+------------+---------+
| LocationID | Zipcode |
+------------+---------+
|          1 |   33004 |
|          2 |   96818 |
|          3 |   08816 |
+------------+---------+

我有一个函数，它接受zipcodes，连接到具有纬度/经度的表，计算用户拉链和位置zip之间的距离，并返回较短的距离。

dbo.fnMinZipDistance(Location.Zipcode, User.Zipcode1, User.Zipcode2)

example from user 1 & location 1:
dbo.fnMinZipDistance('33004', '08003', '10016') returns 995.383 
because the distance from 33004 to 10016 is 995.383 
and the distance from 33004 to 08003 is 1067.802

对于每个UserID，我需要3个LocationID，与用户邮政编码的距离最短。

我的最初攻击途径是获取每个位置的距离，按用户按距离排序，并选择行号的位置＆lt; 4

WITH UserLocations as 
(
    SELECT
    U.UserID, 
    L.LocationID, 
    rowNum = row_number() Over(partition by U.UserID ORDER BY dbo.fnMinZipDistance(L.Zipcode, U.Zipcode1, U.Zipcode2))
    FROM
    USERS U
    JOIN LOCATIONS L ON 1 = 1
)
SELECT * FROM UserLocations WHERE rowNum < 4

这需要多天才能运行，因为在我能够获得最接近的3个之前，我必须获得总共大约10亿条记录的所有距离。

我的下一个想法是预先计算表中要加入的每个可能的邮政编码距离，但这可能是大约18亿种可能的组合（大约43000个活跃的美国邮政编码），而且我不确定它会有多大帮助。我现在正在运行它进行比较，我在尝试时间为6小时。

我需要大幅缩短此查询的运行时间。

任何建议都将非常感谢。

Answer 1

预先计算每个邮政编码到每个邮政编码的距离可能是最佳选择。

只处理44,000个邮政编码已经是处理原始数据200万的一大胜利。例如，您可以在摘要表中保留五个最接近的邮政编码。

您还可以简化搜索。我认为您可以假设最接近的邮政编码在给定邮政编码的某个距离（比如100英里）内。这样您就可以在纬度和经度上放置搜索范围。

SQL Server 3最近的位置

1 个答案: