tsql分组与基于变量的重复

时间:2017-11-24 16:02:39

标签: sql sql-server tsql

我想从表中创建一些聚合,但我无法找到解决方案。

示例表:

DECLARE @MyTable TABLE(person INT, the_date date, the_value int)
INSERT INTO @MyTable VALUES
(1,'2017-01-01', 10),
(1,'2017-02-01', 5), 
(1,'2017-03-01', 5),
(1,'2017-04-01', 10),
(1,'2017-05-01', 2),
(2,'2017-04-01', 10),
(2,'2017-05-01', 10),
(2,'2017-05-01', 0),
(3,'2017-01-01', 2)

对于当时存在的每个人,我想在给定一些开始日期(@start_date)的情况下平均最后x(@months_back)个月的值:

DECLARE @months_back int, @start_date date
set @months_back = 3 
set @start_date = '2017-05-01'

SELECT person, avg(the_value) as avg_the_value  
FROM @MyTable
where the_date <= @start_date and the_date >= dateadd(month, -@months_back, @start_date)
group by person

这很有效。我现在想再次做同样的事情但是从开始日期开始跳过几个月(@month_skip)。然后我想将这两张桌合并在一起。然后,我再次想要从这个日期开始跳过@month_skip几个月并做同样的事情。我想继续这样做,直到我跳过某个指定的日期(@min_date)。

DECLARE @months_back int, @month_skip int, @start_date date, @min_date date
set @months_back = 3 
set @month_skip = 2
set @start_date = '2017-05-01'
set @min_date = '2017-03-01'

使用上述变量和表@MyTable,结果应为:

person | avg_the_value
1      | 5
2      | 6
1      | 6
3      | 2

这里只有一次跳过,因为@min_date是2个月后但是我希望能够根据@min_date进行多次跳过。

这个示例表很简单,但真实的表有更多自动创建的列,因此使用表变量是不可行的,我必须声明结果表的方案。

我问了一个相关问题Here,但未能找到解决此问题的任何答案。

2 个答案:

答案 0 :(得分:0)

听起来你要做的就是以下内容:

从日期开始(例如2017-05-01),回顾@months_back个月并定义一系列日期。例如,如果我们回溯3个月,我们将定义从2017-02-01到2017-05-01的范围。

在我们定义此范围后,我们会回到开始日期并定义开始日期,然后返回@month_skip个月。例如,初次开始日期为2017-05-01,我们可能会跳过2个月,为我们提供2017-03-01的新开始日期。

我们采用这个新的开始日期,并定义一系列相应的日期(如上所述)。这将产生2016-12-01至2017-03-01的范围。

我们会根据需要在指定的最短日期重复此操作,以生成我们要为其计算的日期范围列表:

2017-03-01 through 2017-05-01
2016-12-01 through 2017-03-01
... etc ...

对于每个期间,查看一个人并计算其平均值。

下面的查询应该执行上面描述的操作:我们使用数字表来计算间隔的偏移量,而不是取值并迭代以计算先前的值,用于确定每个值的结束日期和开始日期。间隔/周期。此查询是使用SQL Server 2008 R2构建的,应与未来版本兼容。

/* Table, data, variable declarations */

DECLARE @MyTable TABLE(person INT, the_date date, the_value int)
INSERT INTO @MyTable VALUES
(1,'2017-01-01', 10),
(1,'2017-02-01', 5), 
(1,'2017-03-01', 5),
(1,'2017-04-01', 10),
(1,'2017-05-01', 2),
(2,'2017-04-01', 10),
(2,'2017-05-01', 10),
(2,'2017-05-01', 0),
(3,'2017-01-01', 2)


DECLARE  @months_back int, @month_skip int, @start_date date, @min_date date
set @months_back = 3 
set @month_skip = 2
set @start_date = '2017-05-01'
set @min_date = '2017-01-01'


/*  Common table expression to build list of Integers */
/* reference http://www.itprotoday.com/software-development/build-numbers-table-you-need if you want more info */

declare @end_int bigint = 50
;  WITH IntegersTableFill (ints) AS
  (
    SELECT
  CAST(0 AS BIGINT) AS 'ints'
    UNION ALL

SELECT  (T.ints + 1) AS 'ints'
    FROM  IntegersTableFill T
    WHERE  ints <= (
      CASE
        WHEN  (@end_int <= 32767) THEN @end_int
        ELSE  32767
      END
        )
  )

 /* What we're going to do is define a series of periods. 
    These periods have a start date and an end date, and will simplify grouping 
    (in place of the calculate-and-union approach)
  */


 /* Now, we start defining the periods
    @months_Back_start defines the end of the range we need to calculate for.
    @month_skip defines the amount of time we have to jump back for each period

 */

/* Using the number table we defined above and the data in our variables, calculate start and end dates */

,periodEndDates as
  (

  select ints as Period 
  ,DATEADD(month, -(@months_back*ints), @start_date) as endOfPeriod
  from IntegersTableFill itf
  )

 ,periodStartDates as
  (

  select * 
  ,DATEADD(month, -(@month_skip), endOfPeriod) as startOfPeriod

  from periodEndDates
  )

,finalPeriodData as
(
    select (period) as period, startOfPeriod, endOfPeriod from periodStartDates

)

/* Link the entries in our original data to the periods they fall into */
/* NOTE: The join criteria originally specified allows values to fall into multiple periods.
    You may want to fix this? 
*/

,periodTableJoin as
(
select * from finalPeriodData fpd
inner join @MyTable mt 
    on mt.the_date >= fpd.startOfPeriod
    and mt.the_date <= fpd.endOfPeriod
    and mt.the_date >= @min_date
    and mt.the_date <= @start_date
)

/* Calculate averages, grouping by period and person */

,periodValueAggregate as
(
select person, avg(the_value) as avg_the_value from 
periodTableJoin
group by period, person
)

select * from periodValueAggregate

答案 1 :(得分:0)

我建议的方法是基于集合的,而不是迭代的。 (我不是完全按照您的问题,但请跟进,我们可以解决任何差异) 从本质上讲,您希望将日历划分为感兴趣的时段。周期宽度相等且是连续的。 为此,我建议您构建一个日历表,并使用分区标记句点,如代码所示;

DECLARE  @CalStart          DATE    = '2017-01-01'
        ,@CalEnd            DATE    = '2018-01-01'
        ,@CalWindowSize     INT     = 2

;WITH Numbers AS
(
    SELECT TOP (DATEDIFF(MONTH, @CalStart, @CalEnd)) N = CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT) - 1
    FROM syscolumns
)
SELECT   CalWindow  = N  / @CalWindowSize
        ,CalDate    = DATEADD(MONTH, N, @CalStart)
FROM Numbers

正确配置变量后,您应该有一个代表感兴趣窗口的日历。

然后,将此日历粘贴到您的数据集并将其分组的问题不仅仅是person,还有CalWindow;

DECLARE @MyTable TABLE(person INT, the_date date, the_value int)
INSERT INTO @MyTable VALUES
(1,'2017-01-01', 10),
(1,'2017-02-01', 5), 
(1,'2017-03-01', 5),
(1,'2017-04-01', 10),
(1,'2017-05-01', 2),
(2,'2017-04-01', 10),
(2,'2017-05-01', 10),
(2,'2017-05-01', 0),
(3,'2017-01-01', 2)

----------------------------------
--  Build Calendar
----------------------------------
DECLARE  @CalStart          DATE    = '2017-01-01'
        ,@CalEnd            DATE    = '2018-01-01'
        ,@CalWindowSize     INT     = 2

;WITH Numbers AS
(
    SELECT TOP (DATEDIFF(MONTH, @CalStart, @CalEnd)) N = CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT) - 1
    FROM syscolumns
)
,Calendar AS
(
    SELECT   CalWindow  = N  / @CalWindowSize
            ,CalDate    = DATEADD(MONTH, N, @CalStart)
    FROM Numbers
)
SELECT   TB.Person
        ,AVG(TB.the_value)
FROM @MyTable   TB
JOIN Calendar   CL  ON TB.the_date = CL.CalDate
GROUP BY CL.CalWindow, TB.person

希望我理解你的问题。