基于不同时间标签/时间戳和用户选择输入(T-SQL)的SQL分区/平均

时间:2015-10-23 10:43:04

标签: sql sql-server tsql weighted-average

我有以下问题需要处理 - 我有一个我想要的SQL数据集(例如像这样):

OldTimetag                 OldValue
2012-05-03 12:47:00        5
2012-05-03 13:00:00        1.3
2012-05-03 13:21:00        7
2012-05-03 14:56:00        5
2012-05-03 14:57:00        0.3
....                       ....

现在,我想基于用户选择的间隔将数据划分(和/或平均)到新的时间标记中,例如,每15分钟以第一个时间标签为起点,即:

NewTimetag                 NewValue
2012-05-03 12:47:00        4.507
2012-05-03 13:02:00        1.3 
....                       ....

主要限制是时间标签旁边的值始终有效,直到出现下一个时间标签。因此,时间标签2012-05-03 12:47:00的5的值在接下来的13分钟有效,直到13:00:00。从12:47:00开始的前15分钟的值将是(13 * 5 + 2 * 1.3)/ 15 = 4.507。在接下来的15分钟内,在13:02:00,该值简单地等于1.3 ......(依此类推)

我来得太久了,最好先制作一个“人工桌”,然后将它与旧桌子连接起来。我通过以下方式生成该表:

DECLARE @intStart datetime, @intEnd datetime


SELECT @intStart =min(OldTimetag), @intEnd = MAX(OldTimetag)
FROM OldTable
    where OldTimetag between '2012-05-03 12:47:00' and '2012-05-03 14:57:00'

Declare @ArtificalTable table (NewTimeTag datetime, NewValue Float)
Declare @MinuteSlicer Int
Set @MinuteSlicer = 15
Insert @Hallo Select @intStart, null

While ( @intStart < @intEnd ) BEGIN
    Insert @ArtificalTable
    Select DATEADD(mi,@MinuteSlicer, @intStart), Null 
        Set  @intStart = DATEADD(mi,@MinuteSlicer,@intStart)
        If @intEnd <= DATEADD(mi,@MinuteSlicer,@intStart) 
            Break
End 

这给了我一个输出:

NewTimetag                 NewValue
2012-05-03 12:47:00        Null
2012-05-03 13:02:00        Null
....                       ....

但是,我在下一步遇到问题,如何正确加入表格 - 任何人都可以给我一个提示吗?

2 个答案:

答案 0 :(得分:2)

这是一种方法。

示例数据:

declare @data table(OldTimetag datetime2, OldValue numeric(5,2));
Insert into @data(OldTimetag, OldValue) Values
    ('2012-05-03 12:47:00', 5)
    , ('2012-05-03 13:00:00', 1.3)
    , ('2012-05-03 13:21:00', 7)
    , ('2012-05-03 14:56:00', 5)
    , ('2012-05-03 14:57:00', 0.3);

您的自定义范围大小,以分钟为单位:

declare @mins int = 15;

列表用于快速计算从0到n的有序数字列表,其中n <=到第一个和最后一个OldTimetag之间的分钟数。

With list(n) as (
    Select top(Select 1+DATEDIFF(minute, min(OldTimetag), max(OldTimetag)) From @data) 
        ROW_NUMBER() over(order by (select 1))-1
    From (
        Select 1 From (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x1(n)
        Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x2(n)
        Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x3(n)
    ) as x(n)
)
Select NewTimetag = DATEADD(minute, @mins*(l.n/@mins), MIN(r.startTime)), NewValue = AVG(d.oldValue)
From list l
Cross Join (Select startTime = min(OldTimetag) From @data) as r
Cross Apply (Select maxTimetag = MAX(OldTimetag) From @data Where OldTimetag <= DATEADD(minute, n, startTime)) as mx
Inner Join @data d on d.OldTimetag = mx.maxTimetag
Group By l.n/@mins
  • Cross Join用于将有序列表中的每个数字与数据中的第一个OldTimetag混合。
  • Cross Apply用于在使用OldTimetag创建的每分钟之前获取最近的Cross Join
  • Inner Join然后将最近的OldTimetag与您的数据进行匹配,以便检索oldValue
  • Select只需计算@mins分钟及其NewTimetag的每个范围的平均值。

适用于最小和最大OldTimetag之间最多1000分钟的范围。如果您需要超出此限制,可以在CTE列表中添加第4行:

Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x4(n) => up to 10.000
Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x5(n) => up to 100.000
...

答案 1 :(得分:1)

一种方法是确定间隔(如果它包含至少一个时间戳,则生成间隔),使用下一个时间戳扩充时间表,然后通过将时间间隔与时间表相交来计算每个间隔的平均值。

IF OBJECT_ID('tempdb..#values') IS NOT NULL DROP TABLE #values
CREATE TABLE #values (pk int identity, time datetime, value numeric(10,4))
INSERT INTO #values VALUES ('2012-05-03 12:47:00', 5)
INSERT INTO #values VALUES ('2012-05-03 13:00:00', 1.3)
INSERT INTO #values VALUES ('2012-05-03 13:21:00', 7)
INSERT INTO #values VALUES ('2012-05-03 14:56:00', 5)
INSERT INTO #values VALUES ('2012-05-03 14:57:00', 0.3)

DECLARE @timeSpanMinutes int SET @timeSpanMinutes=15
DECLARE @startTime datetime, @endTtime datetime
SELECT @startTime=MIN(time) FROM #values
SELECT @endTtime =DATEADD(MINUTE,(DATEDIFF(MINUTE,@startTime,MAX(time))
    /@timeSpanMinutes+1)*@timeSpanMinutes, @startTime) FROM #values -- MAX(time) multiple
SELECT intervals.start
  , SUM(value*(DATEDIFF(MINUTE -- minutes in intersection of [start,end] and [time,next]
     , CASE WHEN time<start THEN start ELSE time END -- Maximum(time,start)
     , CASE WHEN next<DATEADD(MINUTE,@timeSpanMinutes,intervals.start) THEN next 
            ELSE DATEADD(MINUTE,@timeSpanMinutes,intervals.start) END -- Minimum(next,end)
     )*1.0/@timeSpanMinutes)) as average
  FROM
  (SELECT DISTINCT DATEADD(MINUTE, (DATEDIFF(MINUTE,@startTime,time)
        /@timeSpanMinutes)*@timeSpanMinutes, @startTime) AS start 
      FROM #values -- round start to multiple of @timeSpanMinutes
    UNION SELECT DISTINCT DATEADD(MINUTE,@timeSpanMinutes+(DATEDIFF(MINUTE,@startTime,time)
        /@timeSpanMinutes)*@timeSpanMinutes, @startTime) 
      FROM #values -- union distinct with same as above but shifted with @timeSpanMinutes
  ) intervals -- intervals start time (end is calculated as start + @timeSpanMinutes)
  INNER JOIN 
  (SELECT v.*,ISNULL((SELECT MIN(time) FROM #values WHERE time>v.time),@endTtime) as next 
       FROM #values v -- add next column to #values
  ) vals
  ON vals.next>=intervals.start and vals.time<=DATEADD(MINUTE,@timeSpanMinutes,start)
  WHERE intervals.start<>@endTtime
  GROUP BY intervals.start
  ORDER BY intervals.start