Question

我有2个（类似的）查询：

-- Query #1 - get all new products not in currently in the Product table
-- Should match any products in the temp table that do not exist in the Product table
INSERT
  INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT t.`ProductId`, t.`ProcessedOn`, \'Activated\'
  FROM `tmpImport` t
  LEFT JOIN `Product` p USING (`ProductId`)
 WHERE p.`ProductId` IS NULL
    ON DUPLICATE KEY UPDATE
       `State` = VALUES(`State`)

-- Query #2 - get all Products that are removed from the Product table
-- Should match any products in the Product table that do not exist in the temp table
INSERT
  INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT p.`ProductId`, p.`LastSeenDate`, \'Deactivated\'
  FROM `Product` p
  LEFT JOIN `tmpImport` t USING (`ProductId`)
 WHERE t.`ProductId` IS NULL
    ON DUPLICATE KEY UPDATE
       `State` = VALUES(`State`)

首次运行时（第1天），当Product表为空时，两个查询都在1秒内运行，但是，在第二次运行时（第2天），当Product表有14000条记录时，第一个查询运行在2秒内，第二个查询在244秒内运行。每个连续的数据导入是相同的（查询＃2为240-250秒。在检查数据库时，所有数据似乎都是正确的，我只是无法弄清楚为什么第二个查询需要这么长时间。

---＆GT;编辑：慢查询日志：

# Query_time: 245.328784  Lock_time: 0.000000 Rows_sent: 0  Rows_examined: 187711973
SET timestamp=1305151558;

INSERT
  INTO `tmpProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT p.`ProductId`, p.`LastSeenDate`, 'Deactivated'
  FROM `Product` p
  LEFT JOIN `tmpImport` t USING (`ProductId`)
 WHERE t.`ProductId` IS NULL
    ON DUPLICATE KEY UPDATE
       `State` = VALUES(`State`);

此时我最关心的是：Rows_examined：187711973（究竟是如何检查那么多行？）Product表大小是~14000条记录，导入表大小是~28000条记录，tmpProductState最多60条记录。

---＆GT;另一个编辑：

EXPLAIN SELECT p.`ProductId` , p.`LastSeenDate` , 'Deactivated'
FROM `Product` p
LEFT JOIN `tmpImport` t
USING ( `ProductId` )
WHERE t.`ProductId` IS NULL 

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE          p       ALL     NULL            NULL    NULL            NULL    14151
1   SIMPLE          t       index   NULL            PRIMARY 100             NULL    28166   Using where; Using index; Not exists

参与的表格：

CREATE TABLE IF NOT EXISTS `tmpImport` (
  `CategoryId`             smallint(5) unsigned NOT NULL,
  `ProcessedOn`            date DEFAULT NULL,
  `ProductId`              varchar(32) NOT NULL,
  `Title`                  varchar(255) DEFAULT NULL,
  `Description`            text,
  `ActivateDate`           date DEFAULT NULL,
  PRIMARY KEY (`CategoryId`,`ProductId`)
) ENGINE=MyISAM DEFAULT CHARSET = UTF8

CREATE TABLE IF NOT EXISTS `tmpProductState` (
  `ProductId` VARCHAR(32) NOT NULL,
  `ChangedOn` DATE NOT NULL,
  `State` ENUM(\'Activated\',\'Deactivated\'),
  PRIMARY KEY(`ProductId`,`ChangedOn`)
) ENGINE = Memory

CREATE TABLE `Product` (
  `ProductId` varchar(32) NOT NULL,
  `Title` varchar(255) DEFAULT NULL,
  `Description` text,
  `ActivateDate` date DEFAULT NULL,
  `LastSeenDate` date DEFAULT NULL,
  PRIMARY KEY (`ProductId`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

Answer 1

您的表没有规范化，您没有有效的索引，并且您的联接是......不寻常。

我假设您无法对表间的数据重复进行任何操作，因此请忽略它。

看起来您要在要加入的表中的列之间复制数据，您应该使用连接中的所有列。所以应该是：

LEFT JOIN `tmpImport` t USING (`ProductId`, `Title`, `Description`, `ActivateDate`)

向表中添加与您要加入或过滤的字段对应的索引。请勿使用复合键作为主键。而是将自动递增字段添加为PK，并在需要强制唯一性时使用唯一键。 product表和tmpImport都应该为每个要连接的列提供键。

希望其中一些想法可以帮助你。

Answer 2

很晚才回复此问题，但您的第一个查询是从 tmpImport 获取所有记录，并使用Product表上的主键从 Product 获取所有记录。这非常有效。第二个查询是从 Product 获取所有记录，然后从 tmpImport 获取匹配的记录，但没有 ProductId 上任何索引的好处< EM> tmpImport 。因此运行得很糟糕。

将 ProductId 上的索引添加到 tmpImport 表中（此键的主键中的 ProductId 将被忽略，因为它不是密钥中的第一列，您没有使用 CategoryId ，这是第一列。

慢mysql查询/可能的查询问题

2 个答案: