使用不规则时间戳聚合SQL函数

时间:2015-10-12 17:42:37

标签: mysql sql time-series

我有一张桌子,里面有时间戳和河流。有些日子我有多个记录,但有些日子我没有记录。

如何计算两个日期之间的平均流量和总流量?

假设两点之间的线性值是可以接受的。也许是某种加权平均值。如果有一些最小二乘回归算法或类似的东西可以提供更准确的结果,那也很棒。

EDIT。对于某一天,我有以下虚构数据用于说明目的。如果可能的话,我想做的比平均值146更好,因为流量很长,持续时间更长,真实的平均值可能超过200.

10/12/15 12:00 AM   100
10/12/15 12:01 AM   102
10/12/15 12:02 AM   104
10/12/15 12:03 AM   106
10/12/15 12:04 AM   200
10/12/15 10:00 PM   204
10/12/15 11:00 PM   208

Average             146

2 个答案:

答案 0 :(得分:0)

这样的事情通常应该是正确的方向:

SELECT AVG(dayflowRate) AS avgFlowRate, SUM(dayFlow) AS totalFlow
FROM (
SELECT DATE(theTS) AS theDate, AVG(flowRate) AS dayFlowRate
    , AVG(flowRate) * (24*60*60) AS dayFlow
FROM theTable
WHERE theTS BETWEEN [beginTS] AND [endTS]
GROUP BY theDate
) AS dayQ
;

但是,假设24 * 60 * 60乘数(仅为了清晰度而扩展),它假设为完整日期。如果您需要更高的精度,您将需要查看MIN和MAX聚合以及TIME_TO_SEC函数。

我认为这(下面)可能会更准确一些:

SELECT AVG(dayflowRate) AS avgFlowRate, SUM(dayFlow) AS totalFlow
FROM (
SELECT DATE(theTS) AS theDate, AVG(flowRate) AS dayFlowRate
    , AVG(flowRate) 
      * ( TIME_TO_SEC(LEAST(MAX(theTS), [endTS]) 
          - TIME_TO_SEC(GREATEST(MIN(theTS), [beginTS]))
        )
      AS dayFlow
FROM theTable
WHERE theTS BETWEEN [beginTS] AND [endTS]
GROUP BY theDate
) AS dayQ
;

编辑:或者可能不会,如果当天的测量时间是上午11点和下午1点,那么dayFlow实际上只会持续两个小时的流量,即使它是在多天的中间。

这应该是最好的:

SELECT AVG(dayflowRate) AS avgFlowRate, SUM(dayFlow) AS totalFlow
FROM (
SELECT DATE(theTS) AS theDate, AVG(flowRate) AS dayFlowRate
    , AVG(flowRate) 
      * ( IF(DATE(theTS)=DATE([endTS]), TIME_TO_SEC([endTS]), (24*60*60))
          - IF(DATE(theTS)=DATE([beginTS]), TIME_TO_SEC([beginTS]), 0)
        )
      AS dayFlow
FROM theTable
WHERE theTS BETWEEN [beginTS] AND [endTS]
GROUP BY theDate
) AS dayQ
;

答案 1 :(得分:0)

您需要进行加权平均。为此,您需要下一个时间戳:

select rf.*,
       (select rf2.timestamp
        from riverflow rf2
        where rf2.timestamp > rf.timestamp
        order by rf.timestamp asc
        limit 1
       ) as nextTimestamp
from riverflow rf;

接下来是加权平均值。我不知道你想如何处理测量周期可能与给定天数不一致的问题。相反,我们只需获取值并报告开始和结束观察时间:

select min(timestamp) as start, max(timestamp) as end,
       (sum(riverflow * timestampdiff(second, timestamp, nexttimestamp) / (24*60*60)) /
        (timestampdiff(second, min(timestamp), max(timestamp)) / (24*60*60)
       ) as avgRiverflow
from (select rf.*,
             (select rf2.timestamp
              from riverflow rf2
              where rf2.timestamp > rf.timestamp
              order by rf2.timestamp asc
              limit 1
             ) as nextTimestamp
      from riverflow rf
      where timestamp >= $date1 and timestamp < $date2
     ) t;