来自多个查询的第一个现有行

时间:2017-08-13 10:49:24

标签: mysql performance union-all

在我们的应用程序中,我们尝试为给定的参数集找到最佳匹配。我们将这些行划分为不同的质量组,这些质量组匹配总参数集的子集。为了匹配这些不同的组,我们有多个选择查询,如果没有找到结果,我们随后会查询,我们现在决定使用UNION ALL和LIMIT 1将它们连接在一起。这导致类似以下内容;

SET @size = 4, @price = 18, @category = 'NEW', @weight = 20, @origin = 'France';
(SELECT * FROM product_catalog WHERE quality = 'A1' AND size = @size AND price = @price AND category = @category AND weight = @weight AND origin = @origin LIMIT 1)
UNION ALL
(SELECT * FROM product_catalog WHERE quality = 'A2' AND size = @size AND price = @price AND category = @category AND origin = @origin LIMIT 1)
UNION ALL
(SELECT * FROM product_catalog WHERE quality = 'A3' AND price = @price AND category = @category AND weight = @weight AND origin = @origin LIMIT 1)
UNION ALL
(SELECT * FROM product_catalog WHERE quality = 'A4' AND price = @price AND category = @category AND origin = @origin LIMIT 1)
UNION ALL
... SOME MORE SELECTS ...
LIMIT 1

现在查询确实按预期运行,但它的执行方式比我们当前的解决方案更差。我认为这与MySQL可能首先执行UNION语句然后实现它只需要返回第一个这一事实有关吗?

您有什么建议可以帮助加快查询速度吗?您是否认为可以将查询重写为存储过程,该存储过程将检查每个查询结果并在找到结果后立即返回?这会加快查询速度吗?

1 个答案:

答案 0 :(得分:1)

首先,一些问题......

  • UNION始终构建一个tmp表。 (在MySQL 5.7.3和MariaDB 10.1中,实际上消除了这种低效率。)
  • 查询最后缺少ORDER BY - 此可能导致错误答案。
  • 需要第二个tmp表来执行外部ORDER BY

现在有些建议有所改进。在不了解更多数据的情况下,我不得不说这些可能会或可能不会更快。

避免*

而不是SELECT *,只需SELECT id然后JOIN返回表格以获取其余列:

SELECT b.*
    FROM ( SELECT id ... UNION ALL ... LIMIT 1 ) AS a
    JOIN product_quality AS b  USING(id);

更多索引:

INDEX(quality, size, price)
INDEX(quality, price, category)
...

进行单表扫描;不需要索引。 (这需要订购quality个值。):

SELECT * FROM ...
    WHERE ( quality = 'A1' AND size = @size AND price = @price ... )
       OR ( quality = 'A3' AND price = @price AND category = @category ... )
    ORDER BY quality
    LIMIT 1

(通常,我建议将OR替换为UNION以获得性能,但我认为您的用例可以采用其他方式。)

CASE

您可以合并前两个选项:

SELECT MIN(IF(weight = @weight, 'A1', 'A2')) AS quality
    WHERE size = @size
      AND price = @price
      AND category = @category
      AND origin = @origin )