如何优化相关子查询?

时间:2019-02-21 18:38:58

标签: mysql

我有针对MySQL运行的查询:

SELECT DISTINCT tp.parts_group as PartsGroup, tpf.code as FeatureCode, CONVERT(tpf.market_id, char) as MarketID
FROM jpt_product_feature tpf
INNER JOIN jpt_product tp
ON tpf.product_id = tp.id
INNER JOIN jpt_product_model tpm
ON tp.model_id = tpm.id
JOIN ModelImport mi
ON tpm.Code = mi.ModelCode
WHERE NOT EXISTS (
      SELECT 1 
      FROM FeatureSequence fs
      WHERE tp.parts_group = fs.PartsGroup
      AND tpf.code = fs.FeatureCode
      AND (tpf.market_id = fs.MarketID or tpf.market_id is null)
) 
ORDER BY PartsGroup, FeatureCode, MarketID

它可以在我的PC上运行38秒,考虑到跨多个表的大量行,这很好。但是,此查询在功率较小的VM上运行,将运行约2个小时,然后以FATAL ERROR结束。

这是我的索引:

CREATE INDEX idxFeatureSequencePartsGroup ON FeatureSequence (PartsGroup); 
CREATE INDEX idxToyProductPartsGroup ON jpt_product (parts_group); 
CREATE INDEX idxToyProductFeature ON jpt_product_feature (code);
CREATE INDEX idxFeatureSequenceFeatureCode ON FeatureSequence (FeatureCode); 
CREATE INDEX idxToyProductFeatureMarketID ON jpt_product_feature (market_id);
CREATE INDEX idxFeatureSequenceMarketID ON FeatureSequence (MarketID); 

我们正在努力增强虚拟机,但与此同时,我该怎么做才能加快此查询的速度,对其进行优化,使其更加高效?如果它可以极大地加快查询速度,我什至愿意接受异国情调的/模糊的方法。或者,如果我缺少您认为应该拥有的索引,那可能是一个简单的解决方案。

1 个答案:

答案 0 :(得分:1)

相关查询的效率往往比不相关的查询低(如果可能的话)。在这种情况下,我会尝试以下替代方法:

SELECT DISTINCT tp.parts_group as PartsGroup, tpf.code as FeatureCode, CONVERT(tpf.market_id, char) as MarketID
FROM jpt_product_feature tpf
INNER JOIN jpt_product tp ON tpf.product_id = tp.id
INNER JOIN jpt_product_model tpm ON tp.model_id = tpm.id
INNER JOIN ModelImport mi ON tpm.Code = mi.ModelCode
LEFT JOIN (
      SELECT DISTINCT 1 AS matchCheck
         , fs.PartsGroup AS fsPartsGroup
         , fs.FeatureCode AS fsFeatureCode
         , fs.MarketID AS fsMarketID
      FROM FeatureSequence fs
) AS fs ON tp.parts_group = fs.fsPartsGroup
      AND tpf.code = fs.fsFeatureCode
      AND (tpf.market_id = fs.fsMarketID OR tpf.market_id is null)
WHERE fs.matchCheck IS NULL
ORDER BY PartsGroup, FeatureCode, MarketID
;

在不知道数据分布细节的情况下,很难说这是否会更快(在某些情况下,相关子查询是最佳选择);但这是我要尝试的第一件事。如果FeatureSequence相较于所涉及的其他表而言相对较大,则相关查询可能仍然更好(相对于一个大表,许多小的命中还是一个大命中)。