我有两个MySQL查询,一个接一个,运行得非常快:
QUERY 1
SELECT Ads.AdId FROM Ads, AdsGeometry WHERE
AdsGeometry.AdId = Ads.AdId AND
(ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), AdsGeometry.GeomPoint))
GROUP BY Ads.AdId
此查询在0.0013秒内运行,并返回4行。
QUERY 2
SELECT Ads.AdId FROM Ads, AdsHierarchy WHERE
Ads.AdId = AdsHierarchy.ads_AdId AND
AdsHierarchy.locations_LocationId = 148022797
GROUP BY Ads.AdId
此查询以0.0094秒运行,并返回67行(其中3行与上述查询相同)。
我正在尝试将这两个查询合并到一个查询中,因为稍后,两个查询的结果集应该一起排序,我想使用MySQL进行排序。这就是我尝试过的,在它下面,你会发现它也可以解释:
SELECT Ads.AdId FROM Ads, AdsHierarchy, AdsGeometry WHERE
Ads.AdId = AdsHierarchy.ads_AdId AND
AdsGeometry.AdId = Ads.AdId AND (
ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), AdsGeometry.GeomPoint) OR
AdsHierarchy.locations_LocationId = 148022797
)
GROUP BY Ads.AdId
id select_type table type possible_keys key key_len ref rows Extra
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 SIMPLE AdsGeometry ALL PRIMARY,GeomPoint,sx_adsgeometry_geompoint NULL NULL NULL 682848 Using temporary; Using filesort
1 SIMPLE Ads eq_ref PRIMARY PRIMARY 4 dbname.AdsGeometry.AdId 1 Using where; Using index
1 SIMPLE AdsHierarchy ref Ads_AdsHierarchy,locations_LocationId Ads_AdsHierarchy 4 dbname.Ads.AdId 1 Using where
虽然此查询返回正确的结果集(68行),但运行需要6.5937秒。如果我理解正确,AdsHierarchy
表没有使用它的索引,AdsGeometry
表也没有。
有没有办法将两个查询(或者可能更多的位置或基于多边形的查询)合并在一起,并保持合理的运行速度?
谢谢!
编辑:有关3个表的索引的一些信息
AdsGeometry
表是MyISAM,主键是AdId
。
SHOW INDEXES FROM AdsGeometry
的结果是:
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AdsGeometry 0 PRIMARY 1 AdId A 682848 NULL NULL BTREE
AdsGeometry 1 Latitude 1 Latitude A NULL NULL NULL BTREE
AdsGeometry 1 Longitude 1 Longitude A NULL NULL NULL BTREE
AdsGeometry 1 GeomPoint 1 GeomPoint A NULL 32 NULL SPATIAL
AdsGeometry 1 sx_adsgeometry_geompoint 1 GeomPoint A NULL 32 NULL SPATIAL
AdsGeometry 1 Latitude_2 1 Latitude A NULL NULL NULL BTREE
AdsGeometry 1 Latitude_2 2 Longitude A NULL NULL NULL BTREE
AdsHierarchy
表类型是InnoDB,主键是AdsHierarchyId
。
SHOW INDEXES FROM AdsHierarchy
的结果是:
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AdsHierarchy 0 PRIMARY 1 AdsHierarchyId A 2479044 NULL NULL BTREE
AdsHierarchy 1 Ads_AdsHierarchy 1 ads_AdId A 2479044 NULL NULL BTREE
AdsHierarchy 1 locations_LocationId 1 locations_LocationId A 123952 NULL NULL BTREE
Ads
表类型是InnoDB,主键是AdId
。
SHOW INDEXES FROM Ads
的结果是:
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Ads 0 PRIMARY 1 AdId A 705411 NULL NULL BTREE
Ads 1 Accounts_Ads 1 accounts_AccountId A 2 NULL NULL BTREE
Ads 1 Ads_Locations 1 locations_LocationId A 88176 NULL NULL BTREE
Ads 1 Categories_Ads 1 categories_CategoryId A 16 NULL NULL BTREE
Ads 1 Currencies_Ads 1 currencies_Currency A 2 NULL NULL BTREE
Ads 1 countries_CountryId 1 countries_CountryId A 204 NULL NULL BTREE
Ads 1 ExternalId 1 ExternalId A 705411 NULL NULL BTREE
Ads 1 ExternalId 2 accounts_AccountId A 705411 NULL NULL BTREE
Ads 1 xml_XMLId 1 xml_XMLId A 4 NULL NULL BTREE
Ads 1 streets_StreetId 1 streets_StreetId A 2 NULL NULL YES BTREE
编辑2:使用隐式联接重写查询,并解释:
这是查询,重写为使用隐式连接,但它仍然运行得非常缓慢(5.503秒)
SELECT a.AdId FROM Ads AS a
JOIN AdsHierarchy AS ah ON a.AdId = ah.ads_AdId
JOIN AdsGeometry AS ag ON a.AdId = ag.AdId
WHERE
ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), ag.GeomPoint)
OR ah.locations_LocationId = 148022797
GROUP BY a.AdId
id select_type table type possible_keys key key_len ref rows Extra
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 SIMPLE a index PRIMARY PRIMARY 4 NULL 627853 Using index
1 SIMPLE ag eq_ref PRIMARY,GeomPoint,sx_adsgeometry_geompoint PRIMARY 8 micasa_dev.a.AdId 1 Using index condition
1 SIMPLE ah ref Ads_AdsHierarchy,locations_LocationId Ads_AdsHierarchy 4 micasa_dev.a.AdId 1 Using where
编辑3:尝试联合两个查询
还尝试了@RobertKoch提供的UNION
方法。
以下UNION
查询运行速度非常快(0.06秒)
SELECT Ads.AdId FROM Ads, AdsGeometry
WHERE
AdsGeometry.AdId = Ads.AdId AND
ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), AdsGeometry.GeomPoint)
GROUP BY Ads.AdId
UNION
SELECT Ads.AdId FROM Ads, AdsHierarchy WHERE
Ads.AdId = AdsHierarchy.ads_AdId AND
AdsHierarchy.locations_LocationId = 148022797
GROUP BY Ads.AdId
我仍然无法使用此方法,因为稍后我需要根据Ads
表对合并两个查询得到的结果集进行排序。
如果我尝试做以下事情,查询再次变得非常慢(3.7秒):
SELECT Ads.AdId FROM Ads WHERE Ads.AdId IN (
SELECT Ads.AdId FROM Ads, AdsGeometry
WHERE
AdsGeometry.AdId = Ads.AdId AND
ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), AdsGeometry.GeomPoint)
GROUP BY Ads.AdId
UNION
SELECT Ads.AdId FROM Ads, AdsHierarchy WHERE
Ads.AdId = AdsHierarchy.ads_AdId AND
AdsHierarchy.locations_LocationId = 148022797
GROUP BY Ads.AdId
) WHERE Ads.AdId > 100000
ORDER BY Ads.ModifiedDate ASC
编辑4:改变UNION所在的位置,似乎可以解决问题
如果我将上述UNION
查询修改为
SELECT Ads.AdId
FROM Ads,
(SELECT Ads.AdId
FROM Ads,
AdsGeometry
WHERE AdsGeometry.AdId = Ads.AdId
AND ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), AdsGeometry.GeomPoint)
GROUP BY Ads.AdId
UNION SELECT Ads.AdId
FROM Ads,
AdsHierarchy
WHERE Ads.AdId = AdsHierarchy.ads_AdId
AND AdsHierarchy.locations_LocationId = 148022797
GROUP BY Ads.AdId) AS nt
WHERE Ads.AdId = nt.AdId
AND Ads.AdId > 1000000
ORDER BY Ads.ModifiedDate ASC
然后查询再次快速运行(~0.0007秒)。
如果没有
UNION
没有解决方案,我愿意向任何可以解释两个UNION
版本之间差异的人(这个和编辑3中的版本)提供奖励),并向我解释一下,为什么查询在按以下顺序写入时运行速度很快,并且在按上述顺序编写时运行缓慢。
如果需要任何其他信息,请在评论中提问,我试着提供它们!感谢
*注意:*我已经为两个UNION查询添加了一个ORDER,以使其更清晰,虽然我只是从表中选择AdId
,但仍然需要来自其他字段的Ads
表。
编辑5:@bovko的请求
1 SIMPLE Ads index NULL countries_CountryId 2 NULL 627853 Using index; Using temporary
1 SIMPLE ag eq_ref PRIMARY PRIMARY 8 micasa_dev.Ads.AdId 1 Using where; Distinct
1 SIMPLE ah ref Ads_AdsHierarchy Ads_AdsHierarchy 4 micasa_dev.Ads.AdId 1 Using where; Distinct
答案 0 :(得分:3)
IN ( SELECT ... )
通常效率低下。避免它。
到目前为止,所有答案都比他们需要的更努力。似乎JOINs
在之后<{em> UNION
之前是不必要的。请参阅下面的更多说明。
SELECT Ads.AdId
FROM Ads,
JOIN (
( SELECT AdId
FROM AdsGeometry
WHERE ST_CONTAINS(GeomFromText('Polygon(( -4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018, -4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882, -4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563 ))'),
AdsGeometry.GeomPoint)
AND AdId > 1000000 )
UNION DISTINCT
( SELECT ads_AdId AS AdId
FROM AdsHierarchy
WHERE locations_LocationId = 148022797
AND ads_AdId > 1000000 )
) AS nt ON Ads.AdId = nt.AdId
ORDER BY Ads.ModifiedDate ASC
注意:
AdsGeometry
和AdsHierarchy
都有adId(不同的名字);除了可能验证它是否存在于JOIN
中之外,不需要在内部查询中执行Ads
。这是一个问题吗?无论如何,我的查询将在外部SELECT's JOIN
处理。UNION DISTINCT
是必需的,因为两个SELECTs
可能会获取相同的ID。> 1000000
内部以减少UNION
收集的值的数量。UNION
将始终(在旧版本的MySQL中)或有时(在较新版本中)创建临时表。你很困惑。IN ( SELECT ... )
通常会非常优化;避免它。ORDER BY
等添加到UNION
; parens明确表示它属于什么。ModifiedDate
进行排序。您可以通过删除该要求来加快速度。 (UNION
可能会创建一个tmp表;这个ORDER BY
可能会创建另一个。)答案 1 :(得分:2)
通过执行
搜索已找到的结果,UNION
个查询都会执行额外的工作
SELECT Ads.AdId FROM Ads WHERE AdId IN ...
或
SELECT Ads.AdId FROM Ads, (SELECT Ads.AdId ...) AS nt WHERE Ads.AdId = nt.AdId
同样SELECT Ads.AdId FROM Ads, ... GROUP BY Ads
可能会更有效率,如果写成SELECT DISTINCT Ads.AdID FROM Ads, ...
因此,这应该提供更好的查询:
SELECT DISTINCT AdId FROM
(SELECT Ads.AdId FROM Ads
INNER JOIN AdsGeometry ON AdsGeometry.AdId = Ads.AdId
WHERE
ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), AdsGeometry.GeomPoint)
UNION ALL
SELECT Ads.AdId FROM Ads
INNER JOIN AdsHierarchy ON Ads.AdId = AdsHierarchy.ads_AdId
WHERE AdsHierarchy.locations_LocationId = 148022797) AS sub
WHERE AdId > 100000
答案 2 :(得分:1)
如果没有您的数据集,我无法确定,但这可能适合您:
SELECT AdId
FROM Ads
WHERE EXISTS (SELECT 1 FROM AdsHierarchy
WHERE Ads.AdId = AdsHierarchy.ads_AdId
AND locations_LocationId = 148022797)
OR EXISTS (SELECT 1 FROM AdsGeometry, AdsHierarchy
WHERE Ads.AdId = AdsHierarchy.ads_AdId
AND Ads.AdId = AdsGeometry.AdId
AND ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), GeomPoint)
)
ORDER BY AdId
答案 3 :(得分:1)
某些数据库对INNER JOIN和LEFT OUTER JOIN具有不同的性能。 只需尝试下一个请求,如果它很慢,请在SELECT之前添加EXPLAIN并提供结果。
SELECT DISTINCT Ads.AdId
FROM Ads
LEFT OUTER JOIN AdsGeometry ag ON ag.AdId = Ads.AdId
LEFT OUTER JOIN AdsHierarchy ah ON ah.ads_AdId = Ads.AdId
WHERE ah.locations_LocationId = 148022797
OR (ST_CONTAINS(GeomFromText('Polygon((
-4.9783515930176 36.627100703563,
-5.0075340270996 36.61222072018,
-4.9896812438965 36.57638676015,
-4.965991973877 36.579419508882,
-4.955005645752 36.617732160006,
-4.9783515930176 36.627100703563
))'), ag.GeomPoint))