我尝试对超过1M行的表执行时态分析查询。一个典型的问题是有多少行满足"一些标准"在任意时间窗口内,例如最近4个月,分成每3天一次。
我们当前的解决方案是每个时间段执行一次计数查询,因此上面的示例将产生40个不同的查询,这使得性能无法接受。
我采取的一种方法是创建一个临时表,如下所示:
create temporary table time_series (
lower_limit timestamp default current_timestamp,
upper_limit timestamp default current_timestamp
);
insert into time_series (lower_limit, upper_limit) values
('2017-01-15 00:00:00', '2017-01-18 00:00:00'),
('2017-01-18 00:00:00', '2017-01-21 00:00:00'),
...
('2017-05-09 00:00:00', '2017-05-12 00:00:00'),
('2017-05-12 00:00:00', '2017-05-15 00:00:00');
select ts.upper_limit, count(mbt.time) from time_series ts
join my_big_table mbt on
(
mbt.time > ts.lower_limit and
mbt.time <= ts.upper_limit
)
group by ts.upper_limit
order by ts.upper_limit;
drop table time_series;
...可生产
+---------------------+-----------------+
| upper_limit | count(mbt.time) |
+---------------------+-----------------+
| 2017-01-18 00:00:00 | 65890 |
| 2017-01-21 00:00:00 | 98230 |
| ... | |
| 2017-05-12 00:00:00 | 57690 |
| 2017-05-15 00:00:00 | 2349 |
+---------------------+-----------------+
这比我们当前的解决方案更高效,但问题是我不拥有数据库。这些表可以驻留在Oracle,SQLServer,MySQL或PostgreSQL中,并且我可能只具有SELECT权限,因此无法保证创建和删除临时表的能力。例如,我在MySQL中执行了上述SQL,但是我必须给自己创建CREATE TEMPORARY TABLE。
有没有办法创建一个合成表&#34; (不知道还有什么可以称之为)我可以在查询范围内使用,它将接受固定的时间戳列表作为周期性边界,类似于我上面的内容,除了没有临时表?
答案 0 :(得分:0)
这是一个(丑陋!)查询,它通过交叉连接生成125个整数的序列[0-124]。
SELECT A.N + 5*(B.N + 5*(C.N)) AS seq
FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS A
JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS B
JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS C
您可以使用LIMIT
子句从中获取整数[0-39],如下所示:
SELECT A.N + 5*(B.N + 5*(C.N)) AS seq
FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS A
JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS B
JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS C
LIMIT 40
然后,您可以使用这个令人讨厌的SQL代码段来生成另一个查询,其中包含一堆日期范围。
SELECT daterange.start_date + INTERVAL sequence.seq DAY lower_limit,
daterange.start_date + INTERVAL (sequence.seq + 3) DAY upper_limit
FROM (
SELECT DATE('2017-01-15') start_date
) daterange
JOIN (
SELECT A.N + 5*(B.N + 5*(C.N)) AS seq
FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS A
JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS B
JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS C
LIMIT 40
) sequence
其内容与您尝试创建的临时表相同。因此,您可以将其用作子查询 - 虚拟表 - 以获得您想要的结果。
这种使用交叉连接的好处是你只需要在dbms中选择东西的权限。
一个额外的好处:如果你向没有让你创建临时表的DBA展示它,她会怜悯你,让你轻松地做到这一点。
如果您恰好在MariaDB 10或更高版本中工作,则会有内置的称为序列表的伪表。例如
SELECT seq FROM seq_0_TO_39
给出相同的整数序列[0-39]。这使得这种事情在SQL中不那么冗长。
答案 1 :(得分:0)
感谢评论中的所有建议。当我在研究你的建议时(比如我是否可以在所有RDBMS中使用表变量),我遇到this comment帮助我找到了答案:
select ts.upper_limit, count(mbt.time) from (
select '2017-04-05 00:00:00' as lower_limit, '2017-04-10 00:00:00' as upper_limit union
select '2017-04-10 00:00:00' as lower_limit, '2017-04-15 00:00:00' as upper_limit union
select '2017-04-15 00:00:00' as lower_limit, '2017-04-20 00:00:00' as upper_limit union
select '2017-04-20 00:00:00' as lower_limit, '2017-04-25 00:00:00' as upper_limit union
select '2017-04-25 00:00:00' as lower_limit, '2017-04-30 00:00:00' as upper_limit union
select '2017-04-30 00:00:00' as lower_limit, '2017-05-05 00:00:00' as upper_limit union
select '2017-05-05 00:00:00' as lower_limit, '2017-05-10 00:00:00' as upper_limit
) as ts
join my_big_table mbt on
(
mbt.time > ts.lower_limit and
mbt.time <= ts.upper_limit
)
group by ts.upper_limit
order by ts.upper_limit;