我有一个巨大的表(数百万行),看起来像这样(实质上)
datatime tagname interesting somemore columns
2014-12-04 20:00:00 grp1_tagA 77 0 0
2014-12-04 20:00:00 grp1_tagB 88 0 0
2014-12-04 20:00:00 grp1_tagC 99 0 0
2014-12-04 20:00:00 grp2_tagA 11 0 0
2014-12-04 20:00:00 grp2_tagB 22 0 0
2014-12-04 20:00:00 grp2_tagC 13 0 0
2014-12-04 21:00:00 grp1_tagA 17 0 0
2014-12-04 21:00:00 grp1_tagC 28 0 0
2014-12-04 21:00:00 grp1_tagC 29 0 0
2014-12-04 21:00:00 grp2_tagA 31 0 0
2014-12-04 21:00:00 grp2_tagB 62 0 0
2014-12-04 21:00:00 grp2_tagC 53 0 0
2014-12-04 22:00:00 grp1_tagA 87 0 0
2014-12-04 22:00:00 grp1_tagB 48 0 0
2014-12-04 22:00:00 grp1_tagC 99 0 0
2014-12-04 22:00:00 grp2_tagA 51 0 0
2014-12-04 22:00:00 grp2_tagB 42 0 0
2014-12-04 22:00:00 grp2_tagC 53 0 0
在真实表中,有几十个组,每组有~100个标签,对于每个组和标签,有几年的小时数据(每个标记名一万行),相当于目前大约800万行。在稍后阶段,其他具有较小时间间隔且因此更大的表格将会起作用。
我需要一种快速的方法来获取表中的所有数据,这些数据与某个组(例如,组1,即标记名以" grp1和#34;开头)在某些日期范围内(数据)被发送到某个客户的浏览器进行可视化。)
所以我想制作一个"第1组摘要"像这样的表
简单的查询就像(暂时删除日期约束)
SELECT A.`datatime` as `datatime`,
A.`interesting` as tagA, B.`interesting` as tagB, C.`interesting` as tagC
FROM `everything` A, `everything` B, `everything` C
WHERE
A.`datatime` = B.`datatime` AND
A.`datatime` = C.`datatime` AND
A.`tagname` = "grp1_tagA" AND
B.`tagname` = "grp1_tagB" AND
C.`tagname` = "grp1_tagC"
实际上它实际上有点复杂,因为在某些日期,某些标签可能包含数据,而其他标签则没有,我也希望这些行包含部分数据。再多一行
我想要的是
为此目的的可能查询是
SELECT GLUE.thyme, A.iwant as tagA, B.iwant as tagB, C.iwant as tagC FROM
(SELECT distinct `datatime` as thyme from `everything`) GLUE left join
(SELECT `datatime` as thyme, `interesting` as iwant from `everything` where `tagname` = "grp1_tagA") A on GLUE.thyme = A.thyme left join
(SELECT `datatime` as thyme, `interesting` as iwant from `everything` where `tagname` = "grp1_tagB") B on GLUE.thyme = B.thyme left join
(SELECT `datatime` as thyme, `interesting` as iwant from `everything` where `tagname` = "grp1_tagC") C on GLUE.thyme = C.thyme
问题:"现实世界"这个查询的版本还不够快。我使用34个标记名称(进行35个表连接)测试了上述查询结构,向子查询的每个添加了where/and datatime >= '2013-12-04'
之类的日期约束,因此总共有8760行(即1行)年份数据)被退回。由此产生的运行时间为2.5分钟。我将目标锁定在半分钟以下,这是通过互联网传输数据的时间。
大表在datatime和tagname上有一个复合主键索引,在datatime上有一个索引(键)。
总体问题是,如何更快地获取数据?
问题1 上述查询可以改进吗?
那将是首选解决方案。
更新已接受的答案提供首选解决方案。可以在没有任何连接的情况下编写此查询。而且它的速度要快得多。 (从2.5分钟开始只需几秒钟,只需测试一下。)无需阅读问题的其余部分,不需要额外的表格。
如果无法做到这一点,则可以在整个可用日期范围内维护一个额外的表group1
,该表具有查询结果的所有数据,并与大表保持同步通过某种方式,可能是触发器。这就是我目前所做的工作,但我怀疑我的触发器运行速度不够快。
所以创建新表
CREATE TABLE `group1` (
`datatime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`tagA` int(32) DEFAULT NULL,
`tagB` int(32) DEFAULT NULL,
`tagC` int(32) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
将数据从大表传输到新表
INSERT INTO group1 (`datatime`) SELECT DISTINCT `datatime` from `everything`;
UPDATE group1 g, (SELECT `datatime` as thyme, `interesting` as iwant from `everything` where `tagname` = "grp1_tagA") as source set g.`tagA` = iwant WHERE g.`datatime`= source.thyme;
UPDATE group1 g, (SELECT `datatime` as thyme, `interesting` as iwant from `everything` where `tagname` = "grp1_tagB") as source set g.`tagB` = iwant WHERE g.`datatime`= source.thyme;
UPDATE group1 g, (SELECT `datatime` as thyme, `interesting` as iwant from `everything` where `tagname` = "grp1_tagC") as source set g.`tagC` = iwant WHERE g.`datatime`= source.thyme;
触发保持新表与大表同步
CREATE TRIGGER everything_group1_after_insert
AFTER INSERT
ON `everything` FOR EACH ROW
BEGIN
DECLARE counter INT;
SET counter = (SELECT count(*) FROM `group1` WHERE datatime = NEW.`datatime`);
IF counter = 0 THEN
INSERT INTO `group1` (`datatime`) VALUES (NEW.`datatime`);
END IF;
IF NEW.TAGNAME = "grp1_tagA" THEN UPDATE `group1` SET `tagA` = NEW.`interesting` WHERE `group1`.`datatime` = NEW.`datatime`; END IF;
IF NEW.TAGNAME = "grp1_tagB" THEN UPDATE `group1` SET `tagB` = NEW.`interesting` WHERE `group1`.`datatime` = NEW.`datatime`; END IF;
IF NEW.TAGNAME = "grp1_tagC" THEN UPDATE `group1` SET `tagC` = NEW.`interesting` WHERE `group1`.`datatime` = NEW.`datatime`; END IF;
END; //
DELIMITER ;
问题2 如何改善触发器的运行时间?或者以某种不同的方式维护表同步(不一定是触发器)?每个标签有1个if语句是不可避免的吗?
问题3 假设新的标签已添加到组中。是否可以以这样的方式编写触发器(或查询,请参阅问题1),在这种情况下,不必为了考虑结果表的新标签/列而重写它?对于查询,我很确定这是不可能的(这需要加入未指定数量的表),但也许触发器有可能吗?
您可以在此处下载上述玩具数据库的sql转储:toy database
更新:我忘记了group1上的主键
alter table `group1` add primary key (datatime)
答案 0 :(得分:3)
尝试在datatime
列上使用group by,并使用case语句,如下所示。
SELECT a.datatime
, sum(case when a.tagname = 'grp1_tagA' then a.interesting else NULL end) as tagA
, sum(case when a.tagname = 'grp1_tagB' then a.interesting else NULL end) as tagB
, sum(case when a.tagname = 'grp1_tagC' then a.interesting else NULL end) as tagC
FROM everything AS a
WHERE a.datatime >= '2013-12-04'
GROUP BY a.datatime
;
答案 1 :(得分:0)
在数百万行的巨大桌面上进行的测试表明,BateTech的优秀答案仍然可以稍微改善一下,就像这样
SELECT a.datatime
, sum(case when a.tagname = 'grp1_tagA' then a.interesting else NULL end) as tagA
, sum(case when a.tagname = 'grp1_tagB' then a.interesting else NULL end) as tagB
, sum(case when a.tagname = 'grp1_tagC' then a.interesting else NULL end) as tagC
FROM (SELECT * FROM everything WHERE datatime >= '2013-12-04' and tagname like "grp1_%") AS a
GROUP BY a.datatime
;