我应该用什么而不是IN?

时间:2014-04-15 13:25:40

标签: mysql sql performance join query-optimization

我有这样的查询:

SELECT DISTINCT devices1_.id AS id27_, devices1_.createdTime AS createdT2_27_, devices1_.deletedOn AS deletedOn27_, 
devices1_.deviceAlias AS deviceAl4_27_, devices1_.deviceName AS deviceName27_, devices1_.deviceTypeId AS deviceT21_27_, 
devices1_.equipmentVendor AS equipmen6_27_, devices1_.exceptionDetail AS exceptio7_27_, devices1_.hardwareVersion AS hardware8_27_, 
devices1_.ipAddress AS ipAddress27_, devices1_.isDeleted AS isDeleted27_, devices1_.loopBack AS loopBack27_, 
devices1_.modifiedTime AS modifie12_27_, devices1_.osVersion AS osVersion27_, devices1_.productModel AS product14_27_, 
devices1_.productName AS product15_27_, devices1_.routerType AS routerType27_, devices1_.rundate AS rundate27_, 
devices1_.serialNumber AS serialN18_27_, devices1_.serviceName AS service19_27_, devices1_.siteId AS siteId27_, 
devices1_.siteIdA AS siteIdA27_, devices1_.status AS status27_, devices1_.creator AS creator27_, devices1_.lastModifier AS lastMod25_27_ 
FROM goldenvariation goldenconf0_ 
INNER JOIN devices devices1_ ON goldenconf0_.deviceId=devices1_.id 
CROSS JOIN devices devices2_ 
WHERE goldenconf0_.deviceId=devices2_.id 
AND (goldenconf0_.classType = 'policy-options') 
AND DATE(goldenconf0_.rundate)=DATE('2014-04-14 00:00:00') 
AND devices2_.isDeleted=0 
AND EXISTS (SELECT DISTINCT(deviceId) FROM goldenvariation goldenconf3_ 
        WHERE (goldenconf3_.goldenVariationType = 'MISMATCH') 
        AND (goldenconf3_.classType = 'policy-options') 
        AND DATE(goldenconf3_.rundate)=DATE('2014-04-14 00:00:00')) 
AND EXISTS (SELECT DISTINCT (deviceId) FROM goldenvariation goldenconf4_ 
        WHERE (goldenconf4_.goldenVariationType = 'MISSING') 
        AND (goldenconf4_.classType = 'policy-options') 
        AND DATE(goldenconf4_.rundate)=DATE('2014-04-14 00:00:00'));

它花了太多时间,我如何重写查询并快速完成?

goldervariation的表结构是:

CREATE TABLE `goldenvariation` (
  `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
  `classType` VARCHAR(255) DEFAULT NULL,
  `createdTime` DATETIME DEFAULT NULL,
  `goldenValue` LONGTEXT,
  `goldenXpath` VARCHAR(255) DEFAULT NULL,
  `isMatched` TINYINT(1) DEFAULT NULL,
  `modifiedTime` DATETIME DEFAULT NULL,
  `pathValue` LONGTEXT,
  `rundate` DATETIME DEFAULT NULL,
  `value` LONGTEXT,
  `xpath` VARCHAR(255) DEFAULT NULL,
  `deviceId` BIGINT(20) DEFAULT NULL,
  `goldenXpathId` BIGINT(20) DEFAULT NULL,
  `creator` INT(10) UNSIGNED DEFAULT NULL,
  `lastModifier` INT(10) UNSIGNED DEFAULT NULL,
  `goldenVariationType` VARCHAR(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `FK6804472AD99F2D15` (`deviceId`),
  KEY `FK6804472A98002838` (`goldenXpathId`),
  KEY `FK6804472A27C863B` (`creator`),
  KEY `FK6804472A3617A57C` (`lastModifier`),
  KEY `rundateindex` (`rundate`),
  KEY `varitionidindex` (`id`),
  KEY `classTypeindex` (`classType`),
  CONSTRAINT `FK6804472A27C863B` FOREIGN KEY (`creator`) REFERENCES `users` (`userid`),
  CONSTRAINT `FK6804472A3617A57C` FOREIGN KEY (`lastModifier`) REFERENCES `users` (`userid`),
  CONSTRAINT `FK6804472A98002838` FOREIGN KEY (`goldenXpathId`) REFERENCES `goldenconfigurationxpath` (`id`),
  CONSTRAINT `FK6804472AD99F2D15` FOREIGN KEY (`deviceId`) REFERENCES `devices` (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1868865 DEFAULT CHARSET=latin1;

解释查询计划是:

"1" "PRIMARY"   "goldenconf0_"  "ref"   "FK6804472AD99F2D15,classTypeindex" "classTypeindex"    "258"   "const" "179223"    "Using where; Using temporary"
"1" "PRIMARY"   "devices2_" "eq_ref"    "PRIMARY,deviceindex"   "PRIMARY"   "8" "cmdb.goldenconf0_.deviceId"    "1" "Using where"
"1" "PRIMARY"   "devices1_" "eq_ref"    "PRIMARY,deviceindex"   "PRIMARY"   "8" "cmdb.goldenconf0_.deviceId"    "1" ""
"3" "DEPENDENT SUBQUERY"    "goldenconf4_"  "index_subquery"    "FK6804472AD99F2D15,classTypeindex" "FK6804472AD99F2D15"    "9" "func"  "19795" "Using where"
"2" "DEPENDENT SUBQUERY"    "goldenconf3_"  "index_subquery"    "FK6804472AD99F2D15,classTypeindex" "FK6804472AD99F2D15"    "9" "func"  "19795" "Using where"

3 个答案:

答案 0 :(得分:1)

INNER JOIN goldenvariation goldenconf4_ 
ON goldenconf4_.deviceId = goldenconf0_deviceId 
AND (goldenconf4_.goldenVariationType = 'MISSING') 
AND (goldenconf4_.classType = 'policy-options')  
AND DATE(goldenconf4_.rundate)=DATE('2014-04-14 00:00:00'))

以同样的方式更改另一个EXISTS。我认为这个应该更快。我也是小小的提示:尝试使用较短的别名。您的查询真的很难读。


SELECT DISTINCT 
devices1_.id AS id27_, 
devices1_.createdTime AS createdT2_27_, 
devices1_.deletedOn AS deletedOn27_, 
devices1_.deviceAlias AS deviceAl4_27_,
devices1_.deviceName AS deviceName27_, 
devices1_.deviceTypeId AS deviceT21_27_, 
devices1_.equipmentVendor AS equipmen6_27_,
devices1_.exceptionDetail AS exceptio7_27_,
devices1_.hardwareVersion AS hardware8_27_, 
devices1_.ipAddress AS ipAddress27_, 
devices1_.isDeleted AS isDeleted27_, 
devices1_.loopBack AS loopBack27_, 
devices1_.modifiedTime AS modifie12_27_, 
devices1_.osVersion AS osVersion27_, 
devices1_.productModel AS product14_27_, 
devices1_.productName AS product15_27_, 
devices1_.routerType AS routerType27_, 
devices1_.rundate AS rundate27_, 
devices1_.serialNumber AS serialN18_27_, 
devices1_.serviceName AS service19_27_, 
devices1_.siteId AS siteId27_, 
devices1_.siteIdA AS siteIdA27_, 
devices1_.status AS status27_, 
devices1_.creator AS creator27_, 
devices1_.lastModifier AS lastMod25_27_ 
FROM goldenvariation goldenconf0_ 
INNER JOIN devices devices1_ ON goldenconf0_.deviceId=devices1_.id 
INNER JOIN goldenvariation a on a.deviceId = goldenconf0_.deviceId and a.goldenVariationType = 'MISMATCH'
INNER JOIN goldenvariation b on b.deviceId = goldenconf0_.deviceId and b.goldenVariationType = 'MISSING'
WHERE (goldenconf0_.classType = 'policy-options') 
AND convert(date,goldenconf0_.rundate) = '2014-04-14'
AND devices1_.isDeleted=0 

试试这个。应该比你的查询更快地工作。您使用CROSS JOIN加入了表格,但SELECT中没有使用此列中的一列。

答案 1 :(得分:1)

您正在寻找通过EXISTS与黄金变化表相关联的元素。我将从该表开始获取不同的ID,然后加入您的设备表。此外,在转换日期时,您无法利用INDEX(如果是索引的一部分)。

INDEX ...(classType,rundate,goldenVariationType,deviceID)

将date子句更改为> =?和< ?+1这样,您可以获得当天凌晨12:00:00到当天晚上11:59:59的整个日期范围,索引可以使用日期组件而无需转换每条记录。

此外,您正在对匹配的" ID"上的设备表TWICE进行交叉连接。从goldenVariations表到相同ID的设备1和2,这是浪费而没有做任何事情。

您的设备表应具有索引ON(id,isDeleted)

SELECT 
      d1.id AS id27, 
      d1.createdTime AS createdT2_27, 
      d1.deletedOn AS deletedOn27,
      d1.deviceAlias AS deviceAl4_27_, 
      d1.deviceName AS deviceName27_,
      d1.deviceTypeId AS deviceT21_27_,
      d1.equipmentVendor AS equipmen6_27_,
      d1.exceptionDetail AS exceptio7_27_,
      d1.hardwareVersion AS hardware8_27_,
      d1.ipAddress AS ipAddress27_,
      d1.isDeleted AS isDeleted27_,
      d1.loopBack AS loopBack27_,
      d1.modifiedTime AS modifie12_27_,
      d1.osVersion AS osVersion27_,
      d1.productModel AS product14_27_,
      d1.productName AS product15_27_,
      d1.routerType AS routerType27_,
      d1.rundate AS rundate27_,
      d1.serialNumber AS serialN18_27_,
      d1.serviceName AS service19_27_,
      d1.siteId AS siteId27_,
      d1.siteIdA AS siteIdA27_,
      d1.status AS status27_,
      d1.creator AS creator27_,
      d1.lastModifier AS lastMod25_27_ 
   from 
      ( SELECT distinct 
              gv.deviceID
           from
              goldenVariation gv
           where
                  gv.classType =  'policy-options'
              AND gv.runDate >= '2014-04-14' 
              AND gv.runDate < '2014-04-15'
              AND gv.goldenVariationType IN ( 'MISSING', 'MISMATCH' )) PQ
         JOIN devices d1
            ON PQ.deviceId = d1.id 
           AND d1.isDeleted = 0 

答案 2 :(得分:1)

是的,可以重写查询以提高性能(虽然它看起来像是由Hibernate生成的查询,并且让Hibernate使用不同的查询可能是一个挑战。)

您对此查询返回预期结果集的确定程度如何?因为查询很奇怪。

就性能,甜甜圈的美元,实际吃午餐的依赖子查询的重复执行,以及性能方面的饭盒而言。看起来MySQL正在使用 deviceId 列上的索引来满足该子查询,这看起来不是最合适的索引。

我们注意到devices表有两个JOIN操作;这个表没有理由需要连接两次。两个JOIN操作都需要与goldenvariation的deviceID列匹配,而对devices表的第二个连接需要使用isDeleted=0进行额外的过滤。关键字INNERCROSS根本不会对声明产生任何影响;并且devices表的第二个连接实际上不是“交叉”连接,它实际上是一个内连接。 (我们更喜欢在ON子句而不是WHERE子句中看到连接谓词。

围绕DATE()列的rundate函数会禁用索引范围扫描操作。可以重写这些谓词以利用适当的索引。

EXISTS子查询的SELECT列表中的 DISTINCT(deviceId) 非常奇怪。首先,DISTINCT是关键字,而不是函数。 deviceId 周围不需要parens。但除此之外,在EXISTS子查询的SELECT列表中返回的内容无关紧要,它可能只是 SELECT 1

看到EXISTS谓词的查询没有引用外部查询中的任何表达式(即相关子查询),这很奇怪。这是有效的语法。使用相关子查询,MySQL对外部查询返回的每一行执行该查询。 EXPLAIN输出看起来像MySQL正在做同样的事情,它没有识别任何优化。

这些EXIST谓词的编写方式,如果没有带有'MISMATCH'的'policy-options'行,并且没有带有'MISSING'的'policy-options'行(对于指定的日期,那么查询将不返回任何行。如果找到每种类型的行(对于指定的日期,则返回该日期的所有'policy-options'行。(它在语法上有效,但它相当奇怪。)

假设devices表上的id列是UNIQUE(即它是PRIMARY KEY或该列上有UNIQUE索引,那么最外层查询上不需要DISTINCT关键字。(来自EXPLAIN输出,看起来MySQL已经优化了通常的操作,也就是说,MySQL认识到DISTINCT关键字是不必要的。


但最重要的是,它是依赖的子查询,正在扼杀性能;没有合适的索引,并且日期列中的谓词包含在函数中。

要回答您的问题,是的,可以重写此查询以更有效地返回等效的结果集。 (查询返回您期望的结果集并不完全清楚。)

SELECT d1.id AS id27_
     , d1.createdTime AS createdT2_27_
     , d1.deletedOn AS deletedOn27_
     , d1.deviceAlias AS deviceAl4_27_
     , d1.deviceName AS deviceName27_
     , d1.deviceTypeId AS deviceT21_27_
     , d1.equipmentVendor AS equipmen6_27_
     , d1.exceptionDetail AS exceptio7_27_
     , d1.hardwareVersion AS hardware8_27_
     , d1.ipAddress AS ipAddress27_
     , d1.isDeleted AS isDeleted27_
     , d1.loopBack AS loopBack27_
     , d1.modifiedTime AS modifie12_27_
     , d1.osVersion AS osVersion27_
     , d1.productModel AS product14_27_
     , d1.productName AS product15_27_
     , d1.routerType AS routerType27_
     , d1.rundate AS rundate27_
     , d1.serialNumber AS serialN18_27_
     , d1.serviceName AS service19_27_
     , d1.siteId AS siteId27_
     , d1.siteIdA AS siteIdA27_
     , d1.status AS status27_
     , d1.creator AS creator27_
     , d1.lastModifier AS lastMod25_27_ 
  FROM devices d1
  JOIN (SELECT g.deviceId
          FROM goldenvariation g
         CROSS 
          JOIN (SELECT 1
                  FROM goldenvariation x3
                 WHERE x3.goldenVariationType = 'MISMATCH'
                   AND x3.classType = 'policy-options' 
                   AND x3.rundate >= '2014-04-14'
                   AND x3.rundate <  '2014-04-14' + INTERVAL 1 DAY
                 LIMIT 1
               ) t3
         CROSS      
          JOIN (SELECT 1
                  FROM goldenvariation x4 
                 WHERE x4.goldenVariationType = 'MISSING'
                   AND x4.classType = 'policy-options'
                   AND x4.rundate >= '2014-04-14'
                   AND x4.rundate <  '2014-04-14' + INTERVAL 1 DAY
                 LIMIT 1
               ) t4
         WHERE g.classType = 'policy-options'
           AND g.rundate >= '2014-04-14'
           AND g.rundate <  '2014-04-14' + INTERVAL 1 DAY
         GROUP BY g.deviceId
       ) t2
    ON t2.device_id = d1.id
 WHERE d1.isDeleted=0