使用交叉连接和Hversine公式在MySQL地理搜索中重复记录

时间:2014-02-22 19:50:29

标签: mysql sql gis cross-join haversine

我正在尝试完成对此Google tutorial

的修改

我已经编写了这个SQL来使用位置“name”查询位置表。给定位置的名称,查询返回附近的披萨店。为了做到这一点,我已经加入了我的餐厅位置表,标题为“标记”,并使用Haversine公式计算距离。

    SELECT m.address,
       m.name,
       m.lat,
       m.lng,
       (3959 * ACOS(COS(RADIANS(poi.lat)) * 
       COS(RADIANS(m.lat)) * 
       COS(RADIANS(m.lng) - RADIANS(poi.lng)) + SIN(RADIANS(poi.lat))*
       SIN(RADIANS(m.lat)))) AS distance
    FROM markers poi
       CROSS JOIN markers m
    WHERE poi.address LIKE "%myrtle beach%"
          AND poi.id <> m.id HAVING distance < 200
   ORDER BY distance LIMIT 0,20

查询返回预期结果,但如果兴趣点在指定区域之外,在本例中为“myrtle beach”,则每次匹配会得到重复记录。这是因为CROSS JOIN很容易用DISTINCT选择修复。但“lng”和“lat”字段是FLOAT类型,因此距离计算永远不会相同,即使对于重复的记录也是如此。

以下是回报的子集:

3901 North Kings Highway Suite 1,Myrtle Beach,SC |东芝加哥比萨公司| 33.716099 -78.855583 | 4.0285562196955125

1706 S Kings Hwy#A,Myrtle Beach,SC |多米诺比萨:默特尔比奇| 33.674881 | -78.905144 | 4.0285562196955125

82 Wentworth St,Charleston,SC | Andolinis Pizza | 2.782330 | -79.934235 | 85.68177495224947

82 Wentworth St,Charleston,SC | Andolinis Pizza | 32.782330 | -79.934235 | 89.71000040441085

114 Jungle Rd,Edisto Island,SC |雄鹿比萨Edisto Beach Inc | 32.503971 -80.297951 | 114.22243529200529

114 Jungle Rd,Edisto Island,SC |雄鹿比萨Edisto Beach Inc | 32.503971 -80.297951 | 118.2509427998286"

有关从何处开始的任何建议?

3 个答案:

答案 0 :(得分:1)

尝试:

select distinct x.address, x.name, y.lat, y.lng, x.distance
  from (SELECT m.address,
               m.name,
               m.lat,
               m.lng,
               (3959 *
               ACOS(COS(RADIANS(poi.lat)) * COS(RADIANS(m.lat)) *
                     COS(RADIANS(m.lng) - RADIANS(poi.lng)) +
                     SIN(RADIANS(poi.lat)) * SIN(RADIANS(m.lat)))) AS distance
          FROM markers poi
         cross JOIN markers m
         WHERE poi.address LIKE "%myrtle beach%"
           and poi.id <> m.id HAVING distance < 200) x
  join markers y
    on x.address = y.address
   and x.name = y.name
   and x.lat = y.lat
   and x.lng = y.lng
 order by x.distance limit 0, 20

答案 1 :(得分:1)

您将获得重复的结果,因为这两个点都匹配“桃金娘海滩”。使用poi.id < m.id等条件确保您只获得一场比赛。

示例:

poi id    m id    distance
1         2       100
2         1       100

查询:

SELECT 
    m.address,
    m.name,
    m.lat,
    m.lng,
    (3959 * ACOS(COS(RADIANS(poi.lat)) * 
    COS(RADIANS(m.lat)) * 
    COS(RADIANS(m.lng) - RADIANS(poi.lng)) + SIN(RADIANS(poi.lat))*
    SIN(RADIANS(m.lat)))) AS distance
FROM markers poi
CROSS JOIN markers m
WHERE 
    (poi.address LIKE "%myrtle beach%" OR m.address LIKE "%myrtle beach%")
    AND poi.id < m.id 
HAVING distance < 200
ORDER BY distance LIMIT 0,20

或者,如果你确实在标记中有一个单行作为兴趣点,请指定而不是地址上的任何匹配。那么你poi.id <> m.id的条件将确保没有重复。

SELECT 
    m.address,
    m.name,
    m.lat,
    m.lng,
    (3959 * ACOS(COS(RADIANS(poi.lat)) * 
    COS(RADIANS(m.lat)) * 
    COS(RADIANS(m.lng) - RADIANS(poi.lng)) + SIN(RADIANS(poi.lat))*
    SIN(RADIANS(m.lat)))) AS distance
FROM markers poi
CROSS JOIN markers m
WHERE 
    poi.id = (SELECT TOP(1) id FROM markers WHERE address LIKE "%myrtle beach%")
    AND poi.id <> m.id 
HAVING distance < 200
ORDER BY distance LIMIT 0,20

答案 2 :(得分:0)

回顾每个人的回答让我思考。我没有问我为什么会得到重复的结果,而是开始想知道两个默特尔海滩位置中的哪一个是查询计算距离?答案是两个。这就解释了为什么我首先在​​每场比赛中获得两项记录。

这是我的解决方案:

SELECT  m.address, m.name, m.lat, m.lng, (3959 
   * ACOS(COS(RADIANS(poi.lat)) * COS(RADIANS(m.lat)) 
   * COS(RADIANS(m.lng) - RADIANS(poi.lng)) + SIN(RADIANS(poi.lat))
   * SIN(RADIANS(m.lat))))     AS distance
FROM markers m
cross JOIN (
   select  name, lat, lng from markers
   where address like '%myrtle beach %'
   limit 1
) poi
HAVING distance < 200
ORDER BY name
LIMIT 0, 20

这并没有给我最精确的距离计算,因为它任意使用它找到的第一家餐厅作为震中。但就我的直接目的而言,这已经足够了。我认为这个应用程序已准备好生产,我需要一个包含城市中心坐标的城市第二张表。