无法优化MySQL查询

时间:2014-06-13 23:03:53

标签: mysql sql indexing query-optimization

我正在处理以下查询,但不确定如何继续进一步优化:

SELECT u.id AS userId, firstName, profilePhotoId, preferredActivityId, preferredSubActivityId, availabilityType,
       3959 * ACOS(COS(radians(requestingUserLat)) * COS(radians(u.latitude)) * COS(radians(u.longitude) - radians(requestingUserLon)) + SIN(radians(requestingUserLat)) * SIN(radians(u.latitude))) AS distanceInMiles
  FROM users u
 WHERE u.id IN (
        SELECT uu.id
          FROM users uu
         WHERE uu.latitude      between lat1    and lat2 -- MySQL 5.7 supports Point data type, but it is not indexed in innoDB. We store latitude and longitude as DOUBLE for now
           AND uu.longitude     between lon1    and lon2
           AND uu.dateOfBirth   between maxAge  and minAge -- dates are in millis, therefore maxAge will have a smaller value than minAge and so it needs to go first
     )
   AND IF(gender       is null, TRUE, u.gender = gender)
   AND IF(activityType is null, TRUE, u.preferredActivityType = activityType)
   AND u.accountState = 'A'
   AND u.id != userId
HAVING distanceInMiles < searchRadius ORDER BY distanceInMiles LIMIT pagingStart, pagingLength;


CREATE INDEX `findMatches` ON `users` (`latitude` ASC, `longitude` ASC, `dateOfBirth` ASC) USING BTREE;


这里的想法是使用上面指定的覆盖索引,使用内部查询根据用户位置和年龄识别符合条件的行。在具有几百万行的表中,将它们缩小到几千,而不需要全表扫描。然后针对更细粒度的条件(例如性别,可用性等)测试生成的行 - 这次对缩减结果集的完整扫描是不可避免的。

这几乎按预期运行,EXPLAIN显示内部查询确实使用覆盖索引的完整密钥长度(3列),然后外部查询查找返回的行,通过PK。 enter image description here


问题:
当搜索范围在几百英里之内时,性能令人满意,但是当我到达一千英里时,它开始降级,因为指定边界内的用户数量增加。如果搜索范围保持不变,但用户数量增加了几个订单,问题也会变得明显。以下是我到目前为止所发现的问题:

  1. MySQL目前在内部查询中不支持LIMIT,因此内部查询将返回所有符合条件的userIDs(即数千),即使外部查询会将它们限制为只有十几个或如此。
  2. 启用optimizer_trace并查看幕后工作表明,只有我的覆盖索引的latitude列用作range。我不确定为什么会这样,特别是因为EXPLAIN表明使用了完整的索引键长度。

  3. 问题:
    我如何解决上述(1)和(2)?在有人建议使用lat和long的空间数据类型之前,请注意the latest InnoDB engine (MySQL v5.7) does not support spatial indexes, just spatial data types

2 个答案:

答案 0 :(得分:0)

我认为其他答案已经涵盖了这一点。在查询的索引中使用数据与使用索引中的数据寻找正确的解决方案之间存在差异。后者是指数的最有效使用。前者很有帮助,但效率很简单,就是不读数据页。

我认为您可以使用exists代替in来改善查询。这应该允许在外层进行过滤以提高查询的性能:

SELECT u.id AS userId, firstName, profilePhotoId, preferredActivityId, preferredSubActivityId, availabilityType,
       3959 * ACOS(COS(radians(requestingUserLat)) * COS(radians(u.latitude)) * COS(radians(u.longitude) - radians(requestingUserLon)) + SIN(radians(requestingUserLat)) * SIN(radians(u.latitude))) AS distanceInMiles
FROM users u
WHERE EXISTS (SELECT 1
              FROM users uu
              WHERE uu.latitude      between lat1    and lat2  AND
                    uu.longitude     between lon1    and lon2 AND
                    uu.dateOfBirth   between maxAge  and minAge  AND
                    uu.id = u.id
             ) AND
     IF(gender       is null, TRUE, u.gender = gender) AND
     IF(activityType is null, TRUE, u.preferredActivityType = activityType) AND
     u.accountState = 'A' AND
     u.id <> userId
HAVING distanceInMiles < searchRadius
ORDER BY distanceInMiles LIMIT pagingStart, pagingLength;

作为一个注释,表达式IF(gender is null, TRUE, u.gender = gender)非常不敏感,因为它总是评估为真。如果您有一个名为gender的变量,则不会在此表达式中使用它。 gender将根据MySQL范围规则进行解释,并且是表中的列。您应始终使用var_p_之类的前缀来区分参数与表格中的列。

编辑:

我应该提到索引需要包含id作为exists使用的第一列。

答案 1 :(得分:0)

您可以将查询简化为:

SELECT u.id AS userId, firstName, profilePhotoId, preferredActivityId, preferredSubActivityId, availabilityType,
       3959 * ACOS(COS(radians(requestingUserLat)) * COS(radians(u.latitude)) * COS(radians(u.longitude) - radians(requestingUserLon)) + SIN(radians(requestingUserLat)) * SIN(radians(u.latitude))) AS distanceInMiles
  FROM users u
   WHERE u.latitude between lat1 and lat2
    AND u.longitude between lon1 and lon2
    AND u.dateOfBirth between maxAge and minAge
    AND IF(gender is null, TRUE, u.gender = gender)
    AND IF(activityType is null, TRUE, u.preferredActivityType = activityType)
    AND u.accountState = 'A'
    AND u.id != userId
HAVING distanceInMiles < searchRadius
ORDER BY distanceInMiles
LIMIT pagingStart, pagingLength;

然后在where子句中为所有列创建索引,您可以在索引中使用列的顺序,从具有较少不同值的列开始(如性别,状态)< / p>