在我们的应用程序中,我们尝试为给定的参数集找到最佳匹配。我们将这些行划分为不同的质量组,这些质量组匹配总参数集的子集。为了匹配这些不同的组,我们有多个选择查询,如果没有找到结果,我们随后会查询,我们现在决定使用UNION ALL和LIMIT 1将它们连接在一起。这导致类似以下内容;
SET @size = 4, @price = 18, @category = 'NEW', @weight = 20, @origin = 'France';
(SELECT * FROM product_catalog WHERE quality = 'A1' AND size = @size AND price = @price AND category = @category AND weight = @weight AND origin = @origin LIMIT 1)
UNION ALL
(SELECT * FROM product_catalog WHERE quality = 'A2' AND size = @size AND price = @price AND category = @category AND origin = @origin LIMIT 1)
UNION ALL
(SELECT * FROM product_catalog WHERE quality = 'A3' AND price = @price AND category = @category AND weight = @weight AND origin = @origin LIMIT 1)
UNION ALL
(SELECT * FROM product_catalog WHERE quality = 'A4' AND price = @price AND category = @category AND origin = @origin LIMIT 1)
UNION ALL
... SOME MORE SELECTS ...
LIMIT 1
现在查询确实按预期运行,但它的执行方式比我们当前的解决方案更差。我认为这与MySQL可能首先执行UNION语句然后实现它只需要返回第一个这一事实有关吗?
您有什么建议可以帮助加快查询速度吗?您是否认为可以将查询重写为存储过程,该存储过程将检查每个查询结果并在找到结果后立即返回?这会加快查询速度吗?
答案 0 :(得分:1)
首先,一些问题......
UNION
始终构建一个tmp表。 (在MySQL 5.7.3和MariaDB 10.1中,实际上消除了这种低效率。)ORDER BY
- 此可能导致错误答案。ORDER BY
。现在有些建议有所改进。在不了解更多数据的情况下,我不得不说这些可能会或可能不会更快。
避免*
:
而不是SELECT *
,只需SELECT id
然后JOIN
返回表格以获取其余列:
SELECT b.*
FROM ( SELECT id ... UNION ALL ... LIMIT 1 ) AS a
JOIN product_quality AS b USING(id);
更多索引:
INDEX(quality, size, price)
INDEX(quality, price, category)
...
进行单表扫描;不需要索引。 (这需要订购quality
个值。):
SELECT * FROM ...
WHERE ( quality = 'A1' AND size = @size AND price = @price ... )
OR ( quality = 'A3' AND price = @price AND category = @category ... )
ORDER BY quality
LIMIT 1
(通常,我建议将OR
替换为UNION
以获得性能,但我认为您的用例可以采用其他方式。)
CASE
:
您可以合并前两个选项:
SELECT MIN(IF(weight = @weight, 'A1', 'A2')) AS quality
WHERE size = @size
AND price = @price
AND category = @category
AND origin = @origin )