如何结合这两个mysql查询来保持它的速度?

时间:2016-09-09 12:08:37

标签: mysql

我有两个MySQL查询,一个接一个,运行得非常快:

QUERY 1

SELECT Ads.AdId FROM Ads, AdsGeometry WHERE 
      AdsGeometry.AdId = Ads.AdId AND
      (ST_CONTAINS(GeomFromText('Polygon((
         -4.9783515930176 36.627100703563, 
         -5.0075340270996 36.61222072018, 
         -4.9896812438965 36.57638676015, 
         -4.965991973877 36.579419508882, 
         -4.955005645752 36.617732160006, 
         -4.9783515930176 36.627100703563
      ))'), AdsGeometry.GeomPoint)) 
GROUP BY Ads.AdId

此查询在0.0013秒内运行,并返回4行。

QUERY 2

 SELECT Ads.AdId FROM Ads, AdsHierarchy WHERE 
      Ads.AdId = AdsHierarchy.ads_AdId AND  
      AdsHierarchy.locations_LocationId = 148022797 
 GROUP BY Ads.AdId

此查询以0.0094秒运行,并返回67行(其中3行与上述查询相同)。

我正在尝试将这两个查询合并到一个查询中,因为稍后,两个查询的结果集应该一起排序,我想使用MySQL进行排序。这就是我尝试过的,在它下面,你会发现它也可以解释:

SELECT Ads.AdId FROM Ads, AdsHierarchy, AdsGeometry WHERE 
      Ads.AdId = AdsHierarchy.ads_AdId AND 
      AdsGeometry.AdId = Ads.AdId AND ( 
          ST_CONTAINS(GeomFromText('Polygon((
             -4.9783515930176 36.627100703563, 
             -5.0075340270996 36.61222072018, 
             -4.9896812438965 36.57638676015, 
             -4.965991973877 36.579419508882, 
             -4.955005645752 36.617732160006, 
             -4.9783515930176 36.627100703563
          ))'), AdsGeometry.GeomPoint) OR 
          AdsHierarchy.locations_LocationId = 148022797
      ) 
GROUP BY Ads.AdId

id  select_type     table           type                  possible_keys                                 key                key_len  ref                      rows       Extra
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1   SIMPLE          AdsGeometry     ALL                   PRIMARY,GeomPoint,sx_adsgeometry_geompoint    NULL               NULL     NULL                     682848     Using temporary; Using filesort
1   SIMPLE          Ads             eq_ref                PRIMARY                                       PRIMARY            4        dbname.AdsGeometry.AdId  1          Using where; Using index
1   SIMPLE          AdsHierarchy    ref                   Ads_AdsHierarchy,locations_LocationId         Ads_AdsHierarchy   4        dbname.Ads.AdId          1          Using where

虽然此查询返回正确的结果集(68行),但运行需要6.5937秒。如果我理解正确,AdsHierarchy表没有使用它的索引,AdsGeometry表也没有。

有没有办法将两个查询(或者可能更多的位置或基于多边形的查询)合并在一起,并保持合理的运行速度?

谢谢!

编辑:有关3个表的索引的一些信息

AdsGeometry表是MyISAM,主键是AdId

SHOW INDEXES FROM AdsGeometry的结果是:

Table           Non_unique  Key_name                    Seq_in_index    Column_name     Collation   Cardinality     Sub_part    Packed      Null    Index_type    Comment   Index_comment
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AdsGeometry     0           PRIMARY                     1               AdId            A           682848          NULL        NULL        BTREE       
AdsGeometry     1           Latitude                    1               Latitude        A           NULL            NULL        NULL        BTREE       
AdsGeometry     1           Longitude                   1               Longitude       A           NULL            NULL        NULL        BTREE       
AdsGeometry     1           GeomPoint                   1               GeomPoint       A           NULL            32          NULL        SPATIAL         
AdsGeometry     1           sx_adsgeometry_geompoint    1               GeomPoint       A           NULL            32          NULL        SPATIAL         
AdsGeometry     1           Latitude_2                  1               Latitude        A           NULL            NULL        NULL        BTREE       
AdsGeometry     1           Latitude_2                  2               Longitude       A           NULL            NULL        NULL        BTREE       

AdsHierarchy表类型是InnoDB,主键是AdsHierarchyId

SHOW INDEXES FROM AdsHierarchy的结果是:

Table           Non_unique  Key_name                    Seq_in_index    Column_name           Collation     Cardinality     Sub_part    Packed      Null    Index_type    Comment   Index_comment
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AdsHierarchy    0           PRIMARY                     1               AdsHierarchyId        A             2479044         NULL        NULL                BTREE       
AdsHierarchy    1           Ads_AdsHierarchy            1               ads_AdId              A             2479044         NULL        NULL                BTREE       
AdsHierarchy    1           locations_LocationId        1               locations_LocationId  A             123952          NULL        NULL                BTREE   

Ads表类型是InnoDB,主键是AdId

SHOW INDEXES FROM Ads的结果是:

Table           Non_unique  Key_name                    Seq_in_index    Column_name           Collation     Cardinality     Sub_part    Packed      Null    Index_type    Comment   Index_comment
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Ads             0           PRIMARY                     1               AdId                   A            705411          NULL        NULL                BTREE       
Ads             1           Accounts_Ads                1               accounts_AccountId     A            2               NULL        NULL                BTREE       
Ads             1           Ads_Locations               1               locations_LocationId   A            88176           NULL        NULL                BTREE       
Ads             1           Categories_Ads              1               categories_CategoryId  A            16              NULL        NULL                BTREE       
Ads             1           Currencies_Ads              1               currencies_Currency    A            2               NULL        NULL                BTREE       
Ads             1           countries_CountryId         1               countries_CountryId    A            204             NULL        NULL                BTREE       
Ads             1           ExternalId                  1               ExternalId             A            705411          NULL        NULL                BTREE       
Ads             1           ExternalId                  2               accounts_AccountId     A            705411          NULL        NULL                BTREE       
Ads             1           xml_XMLId                   1               xml_XMLId              A            4               NULL        NULL                BTREE       
Ads             1           streets_StreetId            1               streets_StreetId       A            2               NULL        NULL        YES     BTREE   

编辑2:使用隐式联接重写查询,并解释

这是查询,重写为使用隐式连接,但它仍然运行得非常缓慢(5.503秒)

 SELECT a.AdId FROM Ads AS a 
   JOIN AdsHierarchy AS ah ON a.AdId = ah.ads_AdId   
   JOIN AdsGeometry AS ag ON a.AdId = ag.AdId 
   WHERE 
      ST_CONTAINS(GeomFromText('Polygon((
          -4.9783515930176 36.627100703563, 
          -5.0075340270996 36.61222072018, 
          -4.9896812438965 36.57638676015, 
          -4.965991973877 36.579419508882, 
          -4.955005645752 36.617732160006, 
          -4.9783515930176 36.627100703563
      ))'), ag.GeomPoint) 
      OR ah.locations_LocationId = 148022797
   GROUP BY a.AdId

id  select_type     table           type                  possible_keys                                 key                key_len  ref                      rows       Extra
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1   SIMPLE          a               index                 PRIMARY                                       PRIMARY            4        NULL                     627853     Using index
1   SIMPLE          ag              eq_ref                PRIMARY,GeomPoint,sx_adsgeometry_geompoint    PRIMARY            8        micasa_dev.a.AdId        1          Using index condition
1   SIMPLE          ah              ref                   Ads_AdsHierarchy,locations_LocationId         Ads_AdsHierarchy   4        micasa_dev.a.AdId        1          Using where

编辑3:尝试联合两个查询

还尝试了@RobertKoch提供的UNION方法。

以下UNION查询运行速度非常快(0.06秒)

SELECT Ads.AdId FROM Ads, AdsGeometry 
WHERE 
    AdsGeometry.AdId = Ads.AdId AND 
    ST_CONTAINS(GeomFromText('Polygon(( 
         -4.9783515930176 36.627100703563, 
         -5.0075340270996 36.61222072018, 
         -4.9896812438965 36.57638676015, 
         -4.965991973877 36.579419508882, 
         -4.955005645752 36.617732160006, 
         -4.9783515930176 36.627100703563 
   ))'), AdsGeometry.GeomPoint) 
GROUP BY Ads.AdId 
UNION 
SELECT Ads.AdId FROM Ads, AdsHierarchy WHERE 
   Ads.AdId = AdsHierarchy.ads_AdId AND  
   AdsHierarchy.locations_LocationId = 148022797 
GROUP BY Ads.AdId

我仍然无法使用此方法,因为稍后我需要根据Ads表对合并两个查询得到的结果集进行排序。

如果我尝试做以下事情,查询再次变得非常慢(3.7秒):

SELECT Ads.AdId FROM Ads WHERE Ads.AdId IN (
  SELECT Ads.AdId FROM Ads, AdsGeometry 
  WHERE 
      AdsGeometry.AdId = Ads.AdId AND 
      ST_CONTAINS(GeomFromText('Polygon(( 
         -4.9783515930176 36.627100703563, 
         -5.0075340270996 36.61222072018, 
         -4.9896812438965 36.57638676015, 
         -4.965991973877 36.579419508882, 
         -4.955005645752 36.617732160006, 
         -4.9783515930176 36.627100703563 
      ))'), AdsGeometry.GeomPoint) 
  GROUP BY Ads.AdId 
  UNION 
  SELECT Ads.AdId FROM Ads, AdsHierarchy WHERE 
     Ads.AdId = AdsHierarchy.ads_AdId AND  
     AdsHierarchy.locations_LocationId = 148022797 
  GROUP BY Ads.AdId
) WHERE Ads.AdId > 100000
ORDER BY Ads.ModifiedDate ASC

编辑4:改变UNION所在的位置,似乎可以解决问题

如果我将上述UNION查询修改为

SELECT Ads.AdId
FROM Ads,
(SELECT Ads.AdId   
    FROM Ads,
    AdsGeometry
   WHERE AdsGeometry.AdId = Ads.AdId
     AND ST_CONTAINS(GeomFromText('Polygon(( 
         -4.9783515930176 36.627100703563, 
         -5.0075340270996 36.61222072018, 
         -4.9896812438965 36.57638676015, 
         -4.965991973877 36.579419508882, 
         -4.955005645752 36.617732160006, 
         -4.9783515930176 36.627100703563 
      ))'), AdsGeometry.GeomPoint)
   GROUP BY Ads.AdId
   UNION SELECT Ads.AdId
   FROM Ads,
        AdsHierarchy
   WHERE Ads.AdId = AdsHierarchy.ads_AdId
     AND AdsHierarchy.locations_LocationId = 148022797
   GROUP BY Ads.AdId) AS nt
WHERE Ads.AdId = nt.AdId
  AND Ads.AdId > 1000000
ORDER BY Ads.ModifiedDate ASC

然后查询再次快速运行(~0.0007秒)。

  

如果没有UNION没有解决方案,我愿意向任何可以解释两个UNION版本之间差异的人(这个和编辑3中的版本)提供奖励),并向我解释一下,为什么查询在按以下顺序写入时运行速度很快,并且在按上述顺序编写时运行缓慢。

如果需要任何其他信息,请在评论中提问,我试着提供它们!感谢

*注意:*我已经为两个UNION查询添加了一个ORDER,以使其更清晰,虽然我只是从表中选择AdId,但仍然需要来自其他字段的Ads表。

编辑5:@bovko的请求

1   SIMPLE  Ads     index   NULL                countries_CountryId     2   NULL                    627853  Using index; Using temporary
1   SIMPLE  ag      eq_ref  PRIMARY             PRIMARY                 8   micasa_dev.Ads.AdId     1       Using where; Distinct  
1   SIMPLE  ah      ref     Ads_AdsHierarchy    Ads_AdsHierarchy        4   micasa_dev.Ads.AdId     1       Using where; Distinct

4 个答案:

答案 0 :(得分:3)

IN ( SELECT ... )通常效率低下。避免它。

到目前为止,所有答案都比他们需要的更努力。似乎JOINs之后<{em> UNION之前是不必要的。请参阅下面的更多说明。

SELECT  Ads.AdId
    FROM  Ads, 
    JOIN (
        ( SELECT  AdId
            FROM  AdsGeometry
            WHERE  ST_CONTAINS(GeomFromText('Polygon(( -4.9783515930176 36.627100703563,
                      -5.0075340270996 36.61222072018, -4.9896812438965 36.57638676015,
                      -4.965991973877 36.579419508882, -4.955005645752 36.617732160006,
                      -4.9783515930176 36.627100703563 ))'),
                              AdsGeometry.GeomPoint)
              AND AdId > 1000000 )
         UNION DISTINCT
         ( SELECT  ads_AdId AS AdId
            FROM  AdsHierarchy
            WHERE  locations_LocationId = 148022797
              AND  ads_AdId > 1000000 )
          ) AS nt ON Ads.AdId = nt.AdId
    ORDER BY  Ads.ModifiedDate ASC

注意:

  • AdsGeometryAdsHierarchy都有adId(不同的名字);除了可能验证它是否存在于JOIN中之外,不需要在内部查询中执行Ads。这是一个问题吗?无论如何,我的查询将在外部SELECT's JOIN处理。
  • UNION DISTINCT是必需的,因为两个SELECTs可能会获取相同的ID。
  • 移动> 1000000内部以减少UNION收集的值的数量。
  • UNION将始终(在旧版本的MySQL中)或有时(在较新版本中)创建临时表。你很困惑。
  • IN ( SELECT ... )通常会非常优化;避免它。
  • 我添加了一些括号;现在可以(但目前没有必要)将ORDER BY等添加到UNION; parens明确表示它属于什么。
  • 外部查询的唯一原因是获取ModifiedDate进行排序。您可以通过删除该要求来加快速度。 (UNION可能会创建一个tmp表;这个ORDER BY可能会创建另一个。)

答案 1 :(得分:2)

通过执行

搜索已找到的结果,UNION个查询都会执行额外的工作

SELECT Ads.AdId FROM Ads WHERE AdId IN ...

SELECT Ads.AdId FROM Ads, (SELECT Ads.AdId ...) AS nt WHERE Ads.AdId = nt.AdId

同样SELECT Ads.AdId FROM Ads, ... GROUP BY Ads可能会更有效率,如果写成SELECT DISTINCT Ads.AdID FROM Ads, ...

,也会更容易理解

因此,这应该提供更好的查询:

SELECT DISTINCT AdId FROM
  (SELECT Ads.AdId FROM Ads
   INNER JOIN AdsGeometry ON AdsGeometry.AdId = Ads.AdId 
   WHERE 
      ST_CONTAINS(GeomFromText('Polygon(( 
         -4.9783515930176 36.627100703563, 
         -5.0075340270996 36.61222072018, 
         -4.9896812438965 36.57638676015, 
         -4.965991973877 36.579419508882, 
         -4.955005645752 36.617732160006, 
         -4.9783515930176 36.627100703563 
      ))'), AdsGeometry.GeomPoint) 
  UNION ALL
  SELECT Ads.AdId FROM Ads
  INNER JOIN AdsHierarchy ON Ads.AdId = AdsHierarchy.ads_AdId
  WHERE AdsHierarchy.locations_LocationId = 148022797) AS sub 
WHERE AdId > 100000

答案 2 :(得分:1)

如果没有您的数据集,我无法确定,但这可能适合您:

SELECT AdId
FROM Ads
WHERE EXISTS (SELECT 1 FROM AdsHierarchy
               WHERE Ads.AdId = AdsHierarchy.ads_AdId
                 AND locations_LocationId = 148022797)
   OR EXISTS (SELECT 1 FROM AdsGeometry, AdsHierarchy
               WHERE Ads.AdId = AdsHierarchy.ads_AdId
                 AND Ads.AdId = AdsGeometry.AdId
                 AND ST_CONTAINS(GeomFromText('Polygon((
                   -4.9783515930176 36.627100703563, 
                   -5.0075340270996 36.61222072018, 
                   -4.9896812438965 36.57638676015, 
                   -4.965991973877 36.579419508882, 
                   -4.955005645752 36.617732160006, 
                   -4.9783515930176 36.627100703563
                 ))'), GeomPoint)
   )
ORDER BY AdId

答案 3 :(得分:1)

某些数据库对INNER JOIN和LEFT OUTER JOIN具有不同的性能。 只需尝试下一个请求,如果它很慢,请在SELECT之前添加EXPLAIN并提供结果。

SELECT DISTINCT Ads.AdId
FROM Ads
LEFT OUTER JOIN AdsGeometry ag ON ag.AdId = Ads.AdId
LEFT OUTER JOIN AdsHierarchy ah ON ah.ads_AdId = Ads.AdId
WHERE ah.locations_LocationId = 148022797
  OR (ST_CONTAINS(GeomFromText('Polygon((
         -4.9783515930176 36.627100703563, 
         -5.0075340270996 36.61222072018, 
         -4.9896812438965 36.57638676015, 
         -4.965991973877 36.579419508882, 
         -4.955005645752 36.617732160006, 
         -4.9783515930176 36.627100703563
      ))'), ag.GeomPoint))