用于SQL的快速最近位置查找器(MySQL,PostgreSQL,SQL Server)

时间:2016-01-20 19:17:25

标签: php mysql haversine

有人可以帮助进行以下查询连接而不是子选择吗?它来自本教程:http://www.plumislandmedia.net/mysql/haversine-mysql-nearest-loc/

实现这一点对于大量行(400万)来说是超级慢的。我认为子选择是根本原因,但我无法弄清楚如何将其作为连接。

SELECT 
    zip,
    primary_city,
    latitude,
    longitude,
    distance
FROM 
    (
        SELECT 
            z.zip,
            z.primary_city,
            z.latitude,
            z.longitude,
            p.radius,
            p.distance_unit * DEGREES(ACOS(COS(RADIANS(p.latpoint)) * COS(RADIANS(z.latitude)) * COS(RADIANS(p.longpoint - z.longitude)) + SIN(RADIANS(p.latpoint)) * SIN(RADIANS(z.latitude)))) AS distance
        FROM zip AS z
            JOIN 
                (
                    /* these are the query parameters */
                    SELECT 
                        42.81 AS latpoint,
                        -70.81 AS longpoint,
                       50.0 AS radius,
                       111.045 AS distance_unit
                ) AS p ON 1 = 1
        WHERE 
            z.latitude BETWEEN p.latpoint - (p.radius / p.distance_unit)
                AND p.latpoint + (p.radius / p.distance_unit)
            AND z.longitude BETWEEN p.longpoint - (p.radius / (p.distance_unit * COS(RADIANS(p.latpoint))))
                AND p.longpoint + (p.radius / (p.distance_unit * COS(RADIANS(p.latpoint))))
    ) AS d
WHERE distance <= radius
ORDER BY distance 
LIMIT 15

1 个答案:

答案 0 :(得分:0)

我不相信你能在这里做些什么。我想你的长时间运行时间来自于通过所有这些计算处理400万条记录所需的大量CPU。

你最里面的子查询只是4个与你的主子查询交叉连接的常量,因此你无法在这里做任何事情来帮助加速它。这将是一次清洗。

你的主子查询(大的SELECT语句)正在这里完成所有的工作并包含在主查询中以保存处理,因为距离需要计算三次,除非mysql的优化器执行某种类型的奇迹并认识到计算使用了三次。

无论如何,这可能是一个性能更差的版本,它删除了最外层的查询:

SELECT 
    z.zip,
    z.primary_city,
    z.latitude,
    z.longitude,    
    p.distance_unit * DEGREES(ACOS(COS(RADIANS(p.latpoint)) * COS(RADIANS(z.latitude)) * COS(RADIANS(p.longpoint - z.longitude)) + SIN(RADIANS(p.latpoint)) * SIN(RADIANS(z.latitude)))) AS distance
FROM zip AS z
    JOIN 
        (
            /* these are the query parameters */
            SELECT 
                42.81 AS latpoint,
                -70.81 AS longpoint,
               50.0 AS radius,
               111.045 AS distance_unit
        ) AS p ON 1 = 1
WHERE 
    z.latitude BETWEEN p.latpoint - (p.radius / p.distance_unit)
        AND p.latpoint + (p.radius / p.distance_unit)
    AND z.longitude BETWEEN p.longpoint - (p.radius / (p.distance_unit * COS(RADIANS(p.latpoint))))
        AND p.longpoint + (p.radius / (p.distance_unit * COS(RADIANS(p.latpoint))))
WHERE p.distance_unit * DEGREES(ACOS(COS(RADIANS(p.latpoint)) * COS(RADIANS(z.latitude)) * COS(RADIANS(p.longpoint - z.longitude)) + SIN(RADIANS(p.latpoint)) * SIN(RADIANS(z.latitude)))) <= p.radius
ORDER BY p.distance_unit * DEGREES(ACOS(COS(RADIANS(p.latpoint)) * COS(RADIANS(z.latitude)) * COS(RADIANS(p.longpoint - z.longitude)) + SIN(RADIANS(p.latpoint)) * SIN(RADIANS(z.latitude)))) LIMIT 15

您可以看到z.distance的所有实例都替换为用于计算查询的WHEREORDER BY部分距离的公式。

如果您想取消持有常量的Cross Joined子查询......您也可以这样做,但现在您在最后一次更改时失去了性能,并且在丢失Cross Join时失去了可读性:

SELECT 
    z.zip,
    z.primary_city,
    z.latitude,
    z.longitude,    
    111.045 * DEGREES(ACOS(COS(RADIANS(42.81)) * COS(RADIANS(z.latitude)) * COS(RADIANS(-70.81 - z.longitude)) + SIN(RADIANS(42.81)) * SIN(RADIANS(z.latitude)))) AS distance
FROM zip AS z  
WHERE 
    z.latitude BETWEEN 42.81 - (50.0 / 111.045)
        AND 42.81 + (50.0 / 111.045)
    AND z.longitude BETWEEN -70.81 - (50.0 / (111.045 * COS(RADIANS(42.81))))
        AND -70.81 + (50.0 / (111.045 * COS(RADIANS(42.81))))
WHERE 111.045 * DEGREES(ACOS(COS(RADIANS(42.81)) * COS(RADIANS(z.latitude)) * COS(RADIANS(-70.81 - z.longitude)) + SIN(RADIANS(42.81)) * SIN(RADIANS(z.latitude)))) <= 50.0
ORDER BY 111.045 * DEGREES(ACOS(COS(RADIANS(42.81)) * COS(RADIANS(z.latitude)) * COS(RADIANS(-70.81 - z.longitude)) + SIN(RADIANS(42.81)) * SIN(RADIANS(z.latitude)))) LIMIT 15

所以...最后,这是一个有趣的练习,但我认为没有任何专业人士,这些变化肯定有一些缺点。