为什么我的MySQL组这么慢?

时间:2012-10-27 20:20:43

标签: mysql sql performance optimization mariadb

我正在尝试查询接近20M行的分区表(按月)。我需要按DATE(transaction_utc)和country_id进行分组。如果我关闭group by并聚合返回的行数刚刚超过40k,这不是太多,但是添加group by会使查询显着变慢,除非GROUP BY位于transaction_utc列上,在这种情况下它得到快。

我一直在尝试通过调整查询和/或索引来优化下面的第一个查询,并且达到了下面的点(大约是最初的2倍)但是仍然坚持使用5s查询来总结45k行,这似乎太过分了。

作为参考,这个盒子是一个全新的24逻辑核心,64GB RAM,Mariadb-5.5.x服务器,提供比服务器上的索引空间更多的INNODB缓冲池,所以不应该是任何RAM或CPU压力。

所以,我正在寻找关于导致这种减速的原因以及加快速度的建议。任何反馈将不胜感激! :)

好的,详情......

以下查询(我实际需要的查询)大约需要5秒钟(+/-),并返回少于100行。

SELECT lss.`country_id` AS CountryId
, Date(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName,  lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD  
FROM `sales` lss  
JOIN `countries` c ON lss.`country_id` = c.`country_id`  
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )  GROUP BY lss.`country_id`, DATE(lss.`transaction_utc`)

同一查询的EXPLAIN SELECT如下所示。请注意,它没有使用transaction_utc键。不应该使用我的覆盖索引吗?

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  lss ref idx_unique,transaction_utc,country_id   idx_unique  50  const   1208802 Using where; Using temporary; Using filesort
1   SIMPLE  c   eq_ref  PRIMARY PRIMARY 4   georiot.lss.country_id  1   

现在介绍我尝试尝试确定最新情况的几个其他选项......

以下查询(已更改分组)大约需要5秒钟(+/-),并且只返回3行:

SELECT lss.`country_id` AS CountryId
, DATE(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName,  lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD  
FROM `sales` lss  
JOIN `countries` c ON lss.`country_id` = c.`country_id`  
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )  GROUP BY lss.`country_id`

以下查询(已删除分组)需要4-5秒(+/-)并返回1行:

SELECT lss.`country_id` AS CountryId
    , DATE(lss.`transaction_utc`) AS TransactionDate
    , c.`name` AS CountryName,  lss.`country_id` AS CountryId
    , COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
    , COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD  
    FROM `sales` lss  
    JOIN `countries` c ON lss.`country_id` = c.`country_id`  
    WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )

以下查询需要.00X秒(+/-)并返回~45k行。这对我来说,我们只是尝试将45K行分组到少于100个组中(如我的初始查询中所示):

SELECT lss.`country_id` AS CountryId
    , DATE(lss.`transaction_utc`) AS TransactionDate
    , c.`name` AS CountryName,  lss.`country_id` AS CountryId
    , COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
    , COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD  
    FROM `sales` lss  
    JOIN `countries` c ON lss.`country_id` = c.`country_id`  
    WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )
GROUP BY lss.`transaction_utc`

TABLE SCHEMA:

CREATE TABLE IF NOT EXISTS `sales` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `user_linkshare_account_id` int(11) unsigned NOT NULL,
  `username` varchar(16) NOT NULL,
  `country_id` int(4) unsigned NOT NULL,
  `order` varchar(16) NOT NULL,
  `raw_tracking_code` varchar(255) DEFAULT NULL,
  `transaction_utc` datetime NOT NULL,
  `processed_utc` datetime NOT NULL ,
  `sku` varchar(16) NOT NULL,
  `sale_original` decimal(10,4) NOT NULL,
  `sale_usd` decimal(10,4) NOT NULL,
  `quantity` int(11) NOT NULL,
  `commission_original` decimal(10,4) NOT NULL,
  `commission_usd` decimal(10,4) NOT NULL,
  `original_currency` char(3) NOT NULL,
  PRIMARY KEY (`id`,`transaction_utc`),
  UNIQUE KEY `idx_unique` (`username`,`order`,`processed_utc`,`sku`,`transaction_utc`),
  KEY `raw_tracking_code` (`raw_tracking_code`),
  KEY `idx_usd_amounts` (`sale_usd`,`commission_usd`),
  KEY `idx_countries` (`country_id`),
  KEY `transaction_utc` (`transaction_utc`,`username`,`country_id`,`sale_usd`,`commission_usd`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE ( TO_DAYS(`transaction_utc`))
(PARTITION pOLD VALUES LESS THAN (735112) ENGINE = InnoDB,
 PARTITION p201209 VALUES LESS THAN (735142) ENGINE = InnoDB,
 PARTITION p201210 VALUES LESS THAN (735173) ENGINE = InnoDB,
 PARTITION p201211 VALUES LESS THAN (735203) ENGINE = InnoDB,
 PARTITION p201212 VALUES LESS THAN (735234) ENGINE = InnoDB,
 PARTITION pMAX VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ AUTO_INCREMENT=19696320 ;

1 个答案:

答案 0 :(得分:9)

违规部分可能是GROUP BY DATE(transaction_utc)。您还声称此查询具有覆盖索引但我看不到。您的5列索引包含查询中使用的所有列,但不是最佳顺序(WHERE - GROUP BY - SELECT)。

因此,找不到有用索引的引擎必须为所有20M行评估此函数。实际上,它找到一个以usernameidx_unique)开头的索引并使用它,因此它必须为(仅)1.2M行评估函数。如果您有(transaction_utc)(username, transaction_utc),则会选择三者中最有用的一个。

您可以通过将列拆分为日期和时间部分来更改表结构吗? 如果可以,那么在(username, country_id, transaction_date)上的(username, transaction_date, country_id)或(改变用于分组的两列的顺序)的索引将非常有效。

(username, country_id, transaction_date, sale_usd, commission_usd)上的覆盖索引甚至更好。


如果要保留当前结构,请尝试将5列索引中的顺序更改为:

(username, country_id, transaction_utc, sale_usd, commission_usd)

或者:

(username, transaction_utc, country_id, sale_usd, commission_usd)

由于您使用的是MariaDB,因此您可以使用 VIRTUAL columns 功能,而无需更改现有列:

添加虚拟(持久)列和相应的索引:

ALTER TABLE sales 
    ADD COLUMN transaction_date DATE NOT NULL
               AS DATE(transaction_utc) 
               PERSISTENT 
    ADD INDEX special_IDX 
        (username, country_id, transaction_date, sale_usd, commission_usd) ;