Question

我有一个很慢的查询...我想显示我附近的最近12个最新成员（在记录用户附近），我的dev数据库有150k行。

花了超过1秒钟，解释查询告诉我过滤了30k行所以30k在我的开发数据库中过滤了150k行...我的服务器在线比这更大....

这是我的查询：

SELECT  profils.*,
        Users.username,
        ( SELECT  count(*)
                from  profilsphotos pp
            where  pp.iduser=Profils.iduser
        ) as nbpics,
        ATAN2(SQRT(POW(COS(RADIANS(50.78961000)) * SIN(RADIANS(Y(gm_coor) - 4.64956000)),
                        2) + POW(COS(RADIANS(X(gm_coor))) * SIN(RADIANS(50.78961000)) - SIN(RADIANS(X(gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(gm_coor) - 4.64956000)),
                                                2)), (SIN(RADIANS(X(gm_coor))) * SIN(RADIANS(50.78961000)) + COS(RADIANS(X(gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(gm_coor) - 4.64956000)))
             ) * 6372.795 AS distance
    from  Users
    inner join  Profils  ON Users.id=Profils.iduser
    where  Profils.Actif=1
      and  profils.idsexe=2
      and  profils.idlookingfor=1
      and  Profils.iduser<>1
    HAVING  distance<400
    order by  Users.id desc, distance asc
    limit  12

请注意，我在这四个字段上添加了一个索引：actif，idsexe，idlookingfor和iduser

我的查询有什么问题？

非常感谢！

帕斯卡

Answer 1

Profils需要

INDEX(Actif, idsexe, idlookingfor) -- in any order

也许distance应该是第一个？..

order by  Users.id desc, distance asc

什么是Y(gm_coor)？如果是存储功能，我们需要了解更多。哪个表有gm_coor？在那之后，也许我们可以讨论一个＆＃34;边界框＆＃34;作为部分加速。

再创建SELECTs的嵌套并将nbpics的计算移动到它。目前，COUNT(*)正在执行30K次。改变后，它只会是12次。

<强>再形成

SELECT  p2.*,
        u.username, 
        ( SELECT  COUNT(*)
            FROM  profilsphotos pp
            where  pp.iduser = p2.iduser 
        ) as nbpics,
        x.distance
    FROM  
        ( SELECT  p1.id,    -- assuming this the PK of Profils
                  (...) AS distance
            FROM  Profils AS p1
            WHERE  p1.Actif=1
              and  p1.idsexe=2
              and  p1.idlookingfor=1
              and  p1.iduser<>1
            HAVING  distance < 400
            ORDER BY  distance
            LIMIT  12 
        ) AS x
    JOIN  profils AS p2 USING(id)
    JOIN  Users AS u  ON u.id = p2.iduser;

Answer 2

我会将SELECT子句中的子查询提取到临时表，索引它并加入它，而不是为select子句中的每个记录执行它（30K次）。

所以步骤是：创建临时表，索引它，运行优化查询。

首先，为查询创建相关索引：

head(df %>% mutate(hospital_name = as.character(hospital_name)))

[1] "as.raw(c(0x48, 0x45, 0x4e, 0x52, 0x59, 0x20, 0x4d, 0x41, 0x59, 0x4f, 0x20, 0x4e, 0x45, 0x57, 0x48, 0x41, 0x4c, 0x4c, 0x20, 0x4d, 0x45, 0x4d, 0x4f, 0x52, 0x49, 0x41, 0x4c, 0x20, 0x48, 0x4f, 0x53, 0x50))"
[2] "as.raw(c(0x48, 0x45, 0x4e, 0x52, 0x59, 0x20, 0x4d, 0x41, 0x59, 0x4f, 0x20, 0x4e, 0x45, 0x57, 0x48, 0x41, 0x4c, 0x4c, 0x20, 0x4d, 0x45, 0x4d, 0x4f, 0x52, 0x49, 0x41, 0x4c, 0x20, 0x48, 0x4f, 0x53, 0x50))"
[3] "as.raw(c(0x48, 0x45, 0x4e, 0x52, 0x59, 0x20, 0x4d, 0x41, 0x59, 0x4f, 0x20, 0x4e, 0x45, 0x57, 0x48, 0x41, 0x4c, 0x4c, 0x20, 0x4d, 0x45, 0x4d, 0x4f, 0x52, 0x49, 0x41, 0x4c, 0x20, 0x48, 0x4f, 0x53, 0x50))"
[4] "as.raw(c(0x48, 0x45, 0x4e, 0x52, 0x59, 0x20, 0x4d, 0x41, 0x59, 0x4f, 0x20, 0x4e, 0x45, 0x57, 0x48, 0x41, 0x4c, 0x4c, 0x20, 0x4d, 0x45, 0x4d, 0x4f, 0x52, 0x49, 0x41, 0x4c, 0x20, 0x48, 0x4f, 0x53, 0x50))"
[5] "as.raw(c(0x48, 0x45, 0x4e, 0x52, 0x59, 0x20, 0x4d, 0x41, 0x59, 0x4f, 0x20, 0x4e, 0x45, 0x57, 0x48, 0x41, 0x4c, 0x4c, 0x20, 0x4d, 0x45, 0x4d, 0x4f, 0x52, 0x49, 0x41, 0x4c, 0x20, 0x48, 0x4f, 0x53, 0x50))"

现在，创建临时表并将其编入索引：

ALTER TABLE
  `Profils`
ADD
  INDEX `profils_idx_actif_iduser` (`Actif`, `iduser`);

ALTER TABLE
  `Users`
ADD
  INDEX `users_idx_id_username` (`id`, `username`);

ALTER TABLE
  `profils`
ADD
  INDEX `profils_idx_idsexe_idlookingfor` (`idsexe`, `idlookingfor`);

ALTER TABLE
  `profilsphotos`
ADD
  INDEX `profilsphotos_idx_iduser` (`iduser`);

现在尝试运行此查询而不是原始查询，看看它是否运行得更快：

-- Transformed subquery to a temp table to improve performance
CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS SELECT
        count(*) AS nbpics,
        iduser 
    FROM
        profilsphotos pp 
    WHERE
        1 = 1 
    GROUP BY
        iduser 
    ORDER BY
        NULL;

ALTER TABLE
  `temp1`
ADD
  INDEX `temp1_idx_iduser_nbpics` (`iduser`, `nbpics`);

慢慢查询有计算字段

2 个答案: