我们有以下mysql表(简化为直接指向)
CREATE TABLE `MONTH_RAW_EVENTS` (
`idEvent` int(11) unsigned NOT NULL,
`city` varchar(45) NOT NULL,
`country` varchar(45) NOT NULL,
`ts` datetime NOT NULL,
`idClient` varchar(45) NOT NULL,
`event_category` varchar(45) NOT NULL,
... bunch of other fields
PRIMARY KEY (`idEvent`),
KEY `idx_city` (`city`),
KEY `idx_country` (`country`),
KEY `idClient` (`idClient`),
) ENGINE=InnoDB;
CREATE TABLE `compilation_table` (
`idClient` int(11) unsigned DEFAULT NULL,
`city` varchar(200) DEFAULT NULL,
`month` int(2) DEFAULT NULL,
`year` int(4) DEFAULT NULL,
`events_profile` int(10) unsigned NOT NULL DEFAULT '0',
`events_others` int(10) unsigned NOT NULL DEFAULT '0',
`events_total` int(10) unsigned NOT NULL DEFAULT '0',
KEY `idx_month` (`month`),
KEY `idx_year` (`year`),
KEY `idx_idClient` (`idClient`),
KEY `idx_city` (`city`)
) ENGINE=InnoDB;
MONTH_RAW_EVENTS
包含近20M行,用户在网站中执行了操作,大小近4GB
compilation_table
每个月都有一个客户/城市摘要,我们用它来实时显示网站上的统计信息
我们每月处理一次统计数据(从第一个表到第二个表),我们正在尝试优化执行此类操作的查询(直到现在我们正在处理PHP中的所有内容,需要很长时间)< / p>
这是我们提出的查询,在使用小的数据子集时似乎正在完成工作, 处理完整数据集需要6小时以上的问题
INSERT INTO compilation_table (idClient,city,month,year,events_profile,events_others)
SELECT IFNULL(OTHERS.idClient,AP.idClient) as idClient,
IF(IFNULL(OTHERS.city,AP.city)='','Others',IFNULL(OTHERS.city,AP.city)) as city,
01,2014,
IFNULL(AP.cnt,0) as events_profile,
IFNULL(OTHERS.cnt,0) as events_others
FROM
(
SELECT idClient,CONCAT(city,', ',country) as city,count(*) as cnt
FROM `MONTH_RAW_EVENTS` WHERE `ts`>'2014-01-01 00:00:00' AND `ts`<='2014-01-31 23:59:59'
AND `event_category`!='CLIENT PROFILE'
GROUP BY idClient,city
) as OTHERS
LEFT JOIN
(
SELECT idClient,CONCAT(city,', ',country) as city,count(*) as cnt
FROM `MONTH_RAW_EVENTS` WHERE `ts`>'2014-01-01 00:00:00' AND `ts`<='2014-01-31 23:59:59'
AND `event_category`='CLIENT PROFILE'
GROUP BY idClient,city
) as CLIPROFILE
ON CLIPROFILE.city=OTHERS.city and CLIPROFILE.idClient=OTHERS.idClient
UNION
SELECT IFNULL(OTHERS.idClient,CLIPROFILE.idClient) as idClient,
IF(IFNULL(OTHERS.city,CLIPROFILE.city)='','Others',IFNULL(OTHERS.city,CLIPROFILE.city)) as city,
01,2014,
IFNULL(CLIPROFILE.cnt,0) as events_profile,
IFNULL(OTHERS.cnt,0) as events_others
FROM
(
SELECT idClient,CONCAT(city,', ',country) as city,count(*) as cnt
FROM `MONTH_RAW_EVENTS` WHERE `ts`>'2014-01-01 00:00:00' AND `ts`<='2014-01-31 23:59:59'
AND `event_category`!='CLIENT PROFILE'
GROUP BY idClient,city
) as OTHERS
RIGHT JOIN
(
SELECT idClient,CONCAT(city,', ',country) as city,count(*) as cnt
FROM `MONTH_RAW_EVENTS` WHERE `ts`>'2014-01-01 00:00:00' AND `ts`<='2014-01-31 23:59:59'
AND `event_category`='CLIENT PROFILE'
GROUP BY idClient,city
) as CLIPROFILE
ON CLIPROFILE.city=OTHERS.city and CLIPROFILE.idClient=OTHERS.idClient
我们要做的是在Mysql中完全外部加入,因此查询的基本架构如下:the one proposed here
我们如何优化查询?我们一直在尝试不同的索引,但是在8小时后仍未完成运行,
MySQL服务器是Percona MySQL 5.5专用机器,带有2cpu,2GB内存和SSD磁盘, 我们使用Percona工具优化了此类服务器的配置,
任何帮助都会非常感激,
感谢
答案 0 :(得分:3)
你正在做一个导致DISTINCT处理的UNION。
通常最好将完全连接重写为左连接以及右连接的非匹配行(如果它是正确的1:n连接)
OTHERS LEFT JOIN CLIPROFILE
ON CLIPROFILE.city=OTHERS.city and CLIPROFILE.idClient=OTHERS.idClient
union all
OTHERS RIGHT JOIN CLIPROFILE
ON CLIPROFILE.city=OTHERS.city and CLIPROFILE.idClient=OTHERS.idClient
WHERE OTHERS.idClient IS NULL
此外,您可以在加入临时表之前实现派生表的结果,因此计算只进行一次(我不知道MySQL的优化器是否足够聪明,可以自动执行此操作)。
另外,分组和加入城市/国家作为单独的列可能更有效,并在外部步骤中将 CONCAT(城市,',',国家/地区)作为城市。