优化具有时区转换的Mysql查询并逐个小时

时间:2015-03-02 11:36:18

标签: mysql sql

这是MySql 5.5中的表,有3000万条记录

CREATE TABLE `campaign_logs` (
  `domain` varchar(50) DEFAULT NULL,
  `campaign_id` varchar(50) DEFAULT NULL,
  `subscriber_id` varchar(50) DEFAULT NULL,
  `message` varchar(21000) DEFAULT NULL,
  `log_time` datetime DEFAULT NULL,
  `log_type` varchar(50) DEFAULT NULL,
  `level` varchar(50) DEFAULT NULL,
  `campaign_name` varchar(500) DEFAULT NULL,
  KEY `subscriber_id_index` (`subscriber_id`),
  KEY `log_type_index` (`log_type`),
  KEY `log_time_index` (`log_time`),
  KEY `campid_domain_logtype_logtime_subid_index` (`campaign_id`,`domain`,`log_type`,`log_time`,`subscriber_id`),
  KEY `domain_logtype_logtime_index` (`domain`,`log_type`,`log_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

在以下查询中,我正在按时间分组进行分组

QUERY

SELECT 
    log_type
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
    ,count(*) AS total
    ,count(DISTINCT subscriber_id) d 
FROM
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_OPENED' 
    AND log_time BETWEEN 
        CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
        CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date

UNION ALL

SELECT
    log_type
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
    ,count(*) AS total
    ,count(DISTINCT subscriber_id) d 
FROM
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index) 
WHERE
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_SENT' 
    AND log_time BETWEEN 
        CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
        CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date 

UNION ALL 

SELECT 
    log_type
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
    ,count(*) AS total
    ,count(DISTINCT subscriber_id) d
FROM
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_CLICKED' 
    AND log_time BETWEEN 
        CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
        CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date;

结果

以上查询会给出这样的结果

+---------------+-------+----------------+-------------+
| EMAIL_CLICKED | 1 AM  |             71 |          83 |
| EMAIL_CLICKED | 1 PM  |             25 |          27 |
| EMAIL_SENT    | 10 AM |             51 |          59 |
| EMAIL_OPENED  | 10 PM |             16 |          18 |

这是上述查询的解释

EXPLAIN

+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+
| id | select_type  | table         | type  | possible_keys                             | key                                       | key_len | ref  | rows   | Extra                                    |
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+
|  1 | PRIMARY      | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468     | NULL |  55074 | Using where; Using index; Using filesort |
|  2 | UNION        | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468     | NULL | 330578 | Using where; Using index; Using filesort |
|  3 | UNION        | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468     | NULL |   1589 | Using where; Using index; Using filesort |
|NULL| UNION RESULT | <union1,2,3>  | ALL   | NULL                                      | NULL                                      | NULL    | NULL |   NULL |                                          |
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+

优化?

我们在此表上有一个覆盖索引。

此查询需要很长时间(超过1分钟)。

如果我从查询中删除了distinct_count(subscriber_id),那么我们会在1.5秒内收到结果,但我需要distinct_count subscriber_id来自查询。

有没有办法优化此查询?

由于

2 个答案:

答案 0 :(得分:3)

您没有处理大量数据,因此group by不应该花费40秒 - 假设您不在桌面上有很多锁定活动的真正繁忙的服务器上。

尝试此版本的查询(限于一个log_type):

SELECT log_type,
       DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS time,
       count(DISTINCT subscriber_id) AS distinct_count,
       count(subscriber_id) AS total_count
FROM stats.campaign_logs
WHERE DOMAIN = 'xxxx' AND
      campaign_id='1234' AND
      log_type = 'EMAIL_SENT' AND
      log_time BETWEEN CONVERT_TZ('2015-02-07 00:00:00','+00:00','+05:30') AND CONVERT_TZ('2015-02-14 23:59:58','+00:00','+05:30')
GROUP BY time;

这应该最佳地使用索引。如果这很快,那么使用union all将行放在一起。很丑,但由于索引优化,有时union allOR / IN快得多。

答案 1 :(得分:-1)

SELECT 
    log_type
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
    ,count(*) AS total
    ,count(DISTINCT subscriber_id) d 
FROM
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type IN ('EMAIL_OPENED','EMAIL_SENT','EMAIL_CLICKED')
    AND log_time BETWEEN 
        CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
        CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date, log_type

如果我理解正确,这可以解决你的问题吗?