我每隔5分钟就会在表中插入数据,这些列包含时间戳和数据。我想根据给定的时间范围选择数据,并且为了性能和时间顺序缩放,正确地省略了数据,以便查询返回最大值为32的数据。
例如,我有2周的数据,或4032条5分钟的分隔条目记录。我想从头到尾选择,将结果集减少到32条记录,但按时间顺序排列记录集,以便32条记录中的每个条目尽可能等距离,也留下边缘记录(开头和结束记录在集合中)不变。
我有抓取大量集合的代码,并使用计算的跳过间隔迭代它们,根据需要删除记录并执行边缘检查。我想知道在查询中是否有更快的方法来代替服务器代码。我正在使用MySQL,但我也会接受MsSQL的答案。
感谢。
答案 0 :(得分:0)
在这些行中有两个日期是输入,5分钟范围和范围内的32个样本:
SELECT rownum
FROM (SELECT @row := @row +1 AS rownum
,@sampleRate AS sampleRate
FROM (SELECT @row := 0
,@sampleRate := TIMESTAMPDIFF(MINUTE,'2011-12-01 00:00:00','2011-12-15 00:00:00') / 5 / 32 ) r
,clientpc
) ranked
WHERE rownum % @sampleRate = 1
答案 1 :(得分:0)
好的,不要说我没有警告你(根据上面的评论部分)。这是为MSSQL编写的;我不熟悉MySQL,所以我试图减少超级专有的东西。这可能都是在一个大丑陋的查询中完成的,但后来更难以理解,所以我把它分解为步骤。
首先,设置一些变量:
DECLARE
@Items real = 32 -- How many items you wish to display
,@From int = 16000 -- Low range delimiter on your target data set
,@Thru int = 17500 -- High range delimiter on your target data set
,@Total real -- Used to store how many items are actually in the target range
简短测试表明,如果@Items小于2或大于@Total的某个大倍数,则会失败。需要进行错误处理或输入测试。我使用实数据类型,以便除法产生十进制值,而不是截断的整数;一定要用整数值设置这些,否则我不知道会发生什么。
下一位创建“Tally”表或“数字表”。它只是一个单列的升序整数表,从1开始,然后上升到你的上限。在这里,我把它限制在256,因为32似乎是你的最大值。 (这个特殊的代码非常钝,但它可以在令人不安的很短的时间内产生数百万行,因此每当我需要这样的东西时,我都会将其剪切掉。)
CREATE TABLE #Tally (Num int not null)
-- "Table of numbers" data generator, as per Itzik Ben-Gan (from multiple sources)
-- Modified to generate 1 through 256
;WITH
L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows
L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows
L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows
num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L3)
insert #Tally (Num)
select N FROM num
获取目标数据集中的行数:
SELECT @Total = count(*)
from Time
where TimeId between @From and @Thru
查看查询,按顺序列出目标范围与集合中的排名(位置,例如1,2,3,4等)。这将处理重复值。 (我的测试基于我们的通用“时间”表,它看起来像任何数据仓库中的大多数时间维度表。)
SELECT
row_number() over (order by TimeId) Ranking
,TimeId
from Time
where TimeId between @From and @Thru
另一个评论查询。这将返回标识最终集的“断点”的数字集。例如,如果你有30个项目并想要7,那么这将产生{5,10,15,20,25,30};结合1,它是你想要的七个(如果我直接遇到问题)。
SELECT distinct ceiling((Num - 1) * @Total / (@Items - 1)) from #Tally
这是主力,包含上述两个查询。基本上,从第一个查询开始,它的排名/位置与第二个查询中标识的“断点”相同。我在第一个项目中使用了OR,因为这比尝试用数学方法填充它更简单。
SELECT xx.Ranking, xx.TimeId
from (select
row_number() over (order by TimeId) Ranking
,TimeId
from Time
where TimeId between @From and @Thru) xx
where Ranking in (select distinct ceiling((Num - 1) * @Total / (@Items - 1)) from #Tally)
or Ranking = 1
正如我所说的那样,它过于复杂,而且可能对某些输入无效 - 但是它的运行速度应该比程序选择更快。
答案 2 :(得分:0)
我想出了这个程序,任何清理工作都表示赞赏。在进行一些盲目调试之后,它就像我想要的那样工作。时间存储为UTC时间戳。
DELIMITER $$
CREATE PROCEDURE `SelectChronoRange`(IN timeBegin BIGINT,
IN timeEnd BIGINT)
BEGIN
DECLARE totalAvail, skip, insideResultMax INT;
SET @maxResults = 64;
SELECT count(*)
INTO totalAvail
FROM `dediwatcherstats`;
SET insideResultMax:= @maxResults - 2;
SET skip := CEIL(totalAvail / insideResultMax);
SET @firstpid = 0;
SET @lastpid = 0;
SELECT `pid` INTO @firstpid
FROM `dediwatcherstats`
WHERE
CASE
WHEN timeBegin IS NOT NULL AND timeEnd IS NOT NULL THEN
`Time`>=timeBegin AND `Time`<=timeEnd
WHEN timeEnd IS NOT NULL THEN
`Time`<=timeEnd
WHEN timeBegin IS NOT NULL THEN
`Time`>=timeBegin
ELSE
TRUE
END
ORDER BY `Time` ASC, `pid` ASC LIMIT 1;
SELECT `pid` INTO @lastpid
FROM `dediwatcherstats`
WHERE
CASE
WHEN timeBegin IS NOT NULL AND timeEnd IS NOT NULL THEN
`Time`>=timeBegin AND `Time`<=timeEnd
WHEN timeEnd IS NOT NULL THEN
`Time`<=timeEnd
WHEN timeBegin IS NOT NULL THEN
`Time`>=timeBegin
ELSE
TRUE
END
ORDER BY `Time` DESC, `pid` DESC LIMIT 1;
SELECT * FROM
(
(
SELECT * FROM `dediwatcherstats`
WHERE `pid`=@firstpid
)
UNION
(
SELECT * FROM `dediwatcherstats`
WHERE
CASE
WHEN timeBegin IS NOT NULL AND timeEnd IS NOT NULL THEN
`Time`>=timeBegin AND `Time`<=timeEnd
WHEN timeEnd IS NOT NULL THEN
`Time`<=timeEnd
WHEN timeBegin IS NOT NULL THEN
`Time`>=timeBegin
ELSE
TRUE
END
AND `pid` % skip=0
LIMIT 62
)
) AS notused
UNION
SELECT * FROM `dediwatcherstats`
WHERE `pid`=@lastpid;
END
它适用于这个简单的表:
CREATE TABLE `dediwatcherstats` (
`pid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`Time` bigint(20) unsigned NOT NULL,
`Data` text,
PRIMARY KEY (`pid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
我希望LIMIT
子句允许参数变量。在我发布的代码中,我对可能想要使用它的任何人使用了64而不是32的限制。