SELECT / GROUP BY - 时间段(10秒,30秒等)

时间:2010-06-21 16:16:05

标签: mysql sql select group-by

我有一个表(MySQL),每n秒捕获一次样本。该表有许多列,但重要的是两个:时间戳(TIMESTAMP类型)和计数(INT类型)。

我想做的是在一定范围内获得计数列的总和和平均值。例如,我每2秒记录一次样本,但我希望所有样本在10秒或30秒窗口内的所有样本的计数列总和。

以下是数据示例:

+---------------------+-----------------+
| time_stamp          | count           |
+---------------------+-----------------+
| 2010-06-15 23:35:28 |               1 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:30 |             942 |
| 2010-06-15 23:35:30 |             180 |
| 2010-06-15 23:35:30 |               4 |
| 2010-06-15 23:35:30 |              52 |
| 2010-06-15 23:35:30 |              12 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:33 |            1468 |
| 2010-06-15 23:35:33 |             247 |
| 2010-06-15 23:35:33 |               1 |
| 2010-06-15 23:35:33 |              81 |
| 2010-06-15 23:35:33 |              16 |
| 2010-06-15 23:35:35 |            1828 |
| 2010-06-15 23:35:35 |             214 |
| 2010-06-15 23:35:35 |              75 |
| 2010-06-15 23:35:35 |               8 |
| 2010-06-15 23:35:37 |            1799 |
| 2010-06-15 23:35:37 |              24 |
| 2010-06-15 23:35:37 |              11 |
| 2010-06-15 23:35:37 |               2 |
| 2010-06-15 23:35:40 |             575 |
| 2010-06-15 23:35:40 |               1 |
| 2010-06-17 10:39:35 |               2 |
| 2010-06-17 10:39:35 |               2 |
| 2010-06-17 10:39:35 |               1 |
| 2010-06-17 10:39:35 |               2 |
| 2010-06-17 10:39:35 |               1 |
| 2010-06-17 10:39:40 |              35 |
| 2010-06-17 10:39:40 |              19 |
| 2010-06-17 10:39:40 |              37 |
| 2010-06-17 10:39:42 |              64 |
| 2010-06-17 10:39:42 |               3 |
| 2010-06-17 10:39:42 |              31 |
| 2010-06-17 10:39:42 |               7 |
| 2010-06-17 10:39:42 |             246 |
+---------------------+-----------------+

我想要的输出(基于上面的数据)应如下所示:

+---------------------+-----------------+
| 2010-06-15 23:35:00 |               1 |  # This is the sum for the 00 - 30 seconds range
| 2010-06-15 23:35:30 |            7544 |  # This is the sum for the 30 - 60 seconds range
| 2010-06-17 10:39:35 |             450 |  # This is the sum for the 30 - 60 seconds range
+---------------------+-----------------+

我已经使用GROUP BY在第二个或每分钟收集这些数字,但我似乎无法找出语法来获得子分钟或范围GROUP BY命令才能正常工作。< / p>

我主要是使用此查询将此表中的数据虹吸到另一个表。

谢谢!

4 个答案:

答案 0 :(得分:66)

GROUP BY UNIX_TIMESTAMP(time_stamp) DIV 30

或者说出于某种原因你想要以20秒的间隔对它们进行分组,它将是DIV 20等。要更改GROUP BY值之间的界限,你可以使用

GROUP BY (UNIX_TIMESTAMP(time_stamp) + r) DIV 30

其中r是一个小于30的文字非负整数。所以

GROUP BY (UNIX_TIMESTAMP(time_stamp) + 5) DIV 30

应该给你hh:mm:05和hh:mm:35之间以及hh:mm:35和hh:mm + 1:05之间的总和。

答案 1 :(得分:6)

我在我的项目中尝试过Hammerite的解决方案,但在系列中缺少样本的情况下,它并没有很好地工作。以下是应该从metric_table中选择时间戳(ts),用户名和平均度量的查询示例,并按27分钟的时间间隔对结果进行分组:

select 
    min(ts), 
    user_name, 
    sum(measure) / 27
from metric_table 
where 
    ts between date_sub('2015-03-17 00:00:00', INTERVAL 2160 MINUTE) and '2015-03-17 00:00:00' 

group by unix_timestamp(ts) div 1620, user_name 
order by ts, user_name
;

注意:27分钟(选择中)= 1620秒(分组依据),2160分钟= 3天(时间范围)

当我针对不规则记录样本的时间序列运行此查询时(换句话说:对于任何给定的时间戳,无法保证找到所有用户名的度量值),结果不会根据间隔标记(没有每27分钟放置一次)。我怀疑这是由于min(ts)在某些组中返回一个大于预期楼层(ts0 + i * interval)的时间戳。我将以前的查询修改为:

select 
    from_unixtime(unix_timestamp(ts) - unix_timestamp(ts) mod 1620) as ts1, 
    user_name, 
    sum(measure) / 27
from metric_table
where 
    ts between date_sub('2015-03-17 00:00:00', INTERVAL 2160 MINUTE) and '2015-03-17 00:00:00' 

group by ts1, user_name 
order by ts1, user_name
;

即使样本丢失也能正常工作。我认为这是因为一旦将数学运动移动到选择它就保证ts1将与时间步长对齐。

答案 2 :(得分:2)

另一种解决方案。

要平均超过您喜欢的任何间隔,您可以将您的dt转换为时间戳,并按您的间隔模数分组(示例中为7秒)。

select FROM_UNIXTIME(
    UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7
) as dt, avg(1das4hrz) from `meteor-m2_msgi`
where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-13 05:02:00'
group by FROM_UNIXTIME(
    UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7);

为了说明它的工作原理,我准备了一个请求,显示计算结果。

select dt_record, minute(dt_record) as mm, SECOND(dt_record) as ss,
UNIX_TIMESTAMP(dt_record) as uxt, UNIX_TIMESTAMP(dt_record) mod 7 as ux7,
FROM_UNIXTIME(
    UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7) as dtsub,
column from `yourtable` where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-13 05:02:00';

+---------------------+--------------------+
| dt                  | avg(column)        |
+---------------------+--------------------+
| 2016-11-13 04:59:43 |  25434.85714285714 |
| 2016-11-13 05:00:42 |  5700.728813559322 |
| 2016-11-13 05:01:41 |  950.1016949152543 |
| 2016-11-13 05:02:40 |  4671.220338983051 |
| 2016-11-13 05:03:39 | 25468.728813559323 |
| 2016-11-13 05:04:38 |  43883.52542372881 |
| 2016-11-13 05:05:37 | 24589.338983050846 |
+---------------------+--------------------+


+---------------------+-----+-----+------------+------+---------------------+----------+
| dt_record           | mm  | ss  | uxt        | ux7  | dtsub               | column   |
+---------------------+------+-----+------------+------+---------------------+----------+
| 2016-11-13 05:00:00 |   0 |   0 | 1479002400 |    1 | 2016-11-13 04:59:59 |    36137 |
| 2016-11-13 05:00:01 |   0 |   1 | 1479002401 |    2 | 2016-11-13 04:59:59 |    36137 |
| 2016-11-13 05:00:02 |   0 |   2 | 1479002402 |    3 | 2016-11-13 04:59:59 |    36137 |
| 2016-11-13 05:00:03 |   0 |   3 | 1479002403 |    4 | 2016-11-13 04:59:59 |    34911 |     
| 2016-11-13 05:00:04 |   0 |   4 | 1479002404 |    5 | 2016-11-13 04:59:59 |    34911 |
| 2016-11-13 05:00:05 |   0 |   5 | 1479002405 |    6 | 2016-11-13 04:59:59 |    34911 |
| 2016-11-13 05:00:06 |   0 |   6 | 1479002406 |    0 | 2016-11-13 05:00:06 |    33726 |
| 2016-11-13 05:00:07 |   0 |   7 | 1479002407 |    1 | 2016-11-13 05:00:06 |    32581 |
| 2016-11-13 05:00:08 |   0 |   8 | 1479002408 |    2 | 2016-11-13 05:00:06 |    32581 |
| 2016-11-13 05:00:09 |   0 |   9 | 1479002409 |    3 | 2016-11-13 05:00:06 |    31475 |
+---------------------+-----+-----+------------+------+---------------------+----------+

有人能更快地提出建议吗?

答案 3 :(得分:0)

很奇怪,但在这里使用解决方案:

Average of data for every 5 minutes in the given times

我们可以提出类似的建议:

select convert(
              (min(dt_record) div 50)*50 - 20*((convert(min(dt_record), 
               datetime) div 50) mod 2), datetime)  as dt, 
       avg(1das4hrz) 
from `meteor-m2_msgi`
where dt_record>='2016-11-13 05:00:00'
       and dt_record < '2016-11-14 00:00:00' 
group by convert(dt_record, datetime) div 50;


select (
convert(
min(dt_record), datetime) div 50)*50 - 20*(
(convert(min(dt_record), datetime) div 50) mod 2
) as dt,
avg(column) from `your_table`
where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-14 00:00:00'
group by convert(dt_record, datetime) div 50;

50是因为 NORMAL 分钟的1/2有30秒,而&#39; INTEGER DATE FORMAT&#39;假设我们除以50