如何使用固定的值列表进行查询

时间:2017-05-10 20:47:23

标签: mysql sql

我尝试对超过1M行的表执行时态分析查询。一个典型的问题是有多少行满足"一些标准"在任意时间窗口内,例如最近4个月,分成每3天一次。

我们当前的解决方案是每个时间段执行一次计数查询,因此上面的示例将产生40个不同的查询,这使得性能无法接受。

我采取的一种方法是创建一个临时表,如下所示:

create temporary table time_series (
    lower_limit timestamp default current_timestamp, 
    upper_limit timestamp default current_timestamp
);

insert into time_series (lower_limit, upper_limit) values
    ('2017-01-15 00:00:00', '2017-01-18 00:00:00'), 
    ('2017-01-18 00:00:00', '2017-01-21 00:00:00'), 
    ...
    ('2017-05-09 00:00:00', '2017-05-12 00:00:00'), 
    ('2017-05-12 00:00:00', '2017-05-15 00:00:00');

select ts.upper_limit, count(mbt.time) from time_series ts 
join my_big_table mbt on 
(
    mbt.time >  ts.lower_limit and 
    mbt.time <= ts.upper_limit
)
group by ts.upper_limit
order by ts.upper_limit;

drop table time_series;

...可生产

+---------------------+-----------------+
|     upper_limit     | count(mbt.time) |
+---------------------+-----------------+
| 2017-01-18 00:00:00 |           65890 | 
| 2017-01-21 00:00:00 |           98230 | 
| ...                 |                 | 
| 2017-05-12 00:00:00 |           57690 | 
| 2017-05-15 00:00:00 |            2349 | 
+---------------------+-----------------+

这比我们当前的解决方案更高效,但问题是我不拥有数据库。这些表可以驻留在Oracle,SQLServer,MySQL或PostgreSQL中,并且我可能只具有SELECT权限,因此无法保证创建和删除临时表的能力。例如,我在MySQL中执行了上述SQL,但是我必须给自己创建CREATE TEMPORARY TABLE。

有没有办法创建一个合成表&#34; (不知道还有什么可以称之为)我可以在查询范围内使用,它将接受固定的时间戳列表作为周期性边界,类似于我上面的内容,除了没有临时表?

2 个答案:

答案 0 :(得分:0)

这是一个(丑陋!)查询,它通过交叉连接生成125个整数的序列[0-124]。

SELECT A.N + 5*(B.N + 5*(C.N)) AS seq
  FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS A
  JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS B
  JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS C

您可以使用LIMIT子句从中获取整数[0-39],如下所示:

SELECT A.N + 5*(B.N + 5*(C.N)) AS seq
  FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS A
  JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS B
  JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS C
 LIMIT 40

然后,您可以使用这个令人讨厌的SQL代码段来生成另一个查询,其中包含一堆日期范围。

SELECT daterange.start_date + INTERVAL sequence.seq DAY lower_limit,
       daterange.start_date + INTERVAL (sequence.seq + 3) DAY upper_limit
  FROM (
          SELECT DATE('2017-01-15') start_date
       ) daterange
  JOIN (
         SELECT A.N + 5*(B.N + 5*(C.N)) AS seq
          FROM (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS A
          JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS B
          JOIN (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) AS C
         LIMIT 40
       ) sequence

其内容与您尝试创建的临时表相同。因此,您可以将其用作子查询 - 虚拟表 - 以获得您想要的结果。

这种使用交叉连接的好处是你只需要在dbms中选择东西的权限。

一个额外的好处:如果你向没有让你创建临时表的DBA展示它,她会怜悯你,让你轻松地做到这一点。

如果您恰好在MariaDB 10或更高版本中工作,则会有内置的称为序列表的伪表。例如

    SELECT seq FROM seq_0_TO_39

给出相同的整数序列[0-39]。这使得这种事情在SQL中不那么冗长。

答案 1 :(得分:0)

感谢评论中的所有建议。当我在研究你的建议时(比如我是否可以在所有RDBMS中使用表变量),我遇到this comment帮助我找到了答案:

select ts.upper_limit, count(mbt.time) from (
    select '2017-04-05 00:00:00' as lower_limit, '2017-04-10 00:00:00' as upper_limit union 
    select '2017-04-10 00:00:00' as lower_limit, '2017-04-15 00:00:00' as upper_limit union 
    select '2017-04-15 00:00:00' as lower_limit, '2017-04-20 00:00:00' as upper_limit union 
    select '2017-04-20 00:00:00' as lower_limit, '2017-04-25 00:00:00' as upper_limit union 
    select '2017-04-25 00:00:00' as lower_limit, '2017-04-30 00:00:00' as upper_limit union 
    select '2017-04-30 00:00:00' as lower_limit, '2017-05-05 00:00:00' as upper_limit union 
    select '2017-05-05 00:00:00' as lower_limit, '2017-05-10 00:00:00' as upper_limit
) as ts
join my_big_table mbt on 
(
    mbt.time >  ts.lower_limit and 
    mbt.time <= ts.upper_limit
)
group by ts.upper_limit
order by ts.upper_limit;