MSSQL2012(Over / Partition)过去12个月,包括空值?

时间:2015-12-10 22:15:37

标签: sql-server sum average partition

所以,我有一个问题:在有可能产品的情况下,过去12个月(计算正在处理的产品代码)获得实际的总和/平均值,通过产品代码进行分区一个月内不存在。

尝试从TL开始; DR,让我们去实际的事情:

  • 我的表有4个字段:年,月,产品代码和值;
  • 我需要去年(选定月份+过去11个月)的总和;
  • 去年我需要平均(同样)。

我尝试过:

SELECT * FROM (
  SELECT year, month, product,
  AVG(value) OVER (
    PARTITION BY product 
    ORDER BY year, month
    ROWS 11 PRECEDING
  ) as average,
  SUM(value) OVER (
    PARTITION BY product 
    ORDER BY year, month
    ROWS 11 PRECEDING
  ) as sum
  FROM suchDB.muchUSER.awesomeTABLE
) q
where year = <insert year> and month = <month>

问题:

  • 如果查询的任何月份没有产品&#39; 123&#39;它会忽略产品是否有历史记录,它不会给我带来平均值或总和过去几个月。
  • 如果过去一个月(应该是查询月份的历史记录的一部分)没有产品“123”,它将忽略该月并继续过去一个月去年&#39;范围(一个没有七月的被查询的十月将返回它的前一年的版本,而不是在十一月停止)。

我们曾经使用'GROUP BY product''WHERE ((year-1)*100)+month > queriedYear-1Month AND (year*100)+month <= queriedYearMonth'查询此内容,直到有人指出我们处于OVER / PARTITION方向并且我们改变了所有内容......但即使这样我们也遇到了一些问题,例如{{1函数将忽略NULL月......

帮助?

忘记一件非常重要的事情

awesomeTABLE中的数据是版本化的 - 每年/每月可以有多个版本。只能使用最新版本。我通常通过加入avg()来做到这一点,但它似乎会杀死一些可能的解决方案......

4 个答案:

答案 0 :(得分:2)

最大的问题是,您需要获取两个列表才能正确汇总数据集中的值 - 一个日期列表和一个产品列表。如果没有这两个列表,最后一个月中缺少的产品意味着不会报告产品,或者(如您已发现的那样)可能会汇总错误的12个月(缺少7月意味着11个前置行包括开始月份)。

以下是对生成这些列表的过程的全面扩展探索。它只使用源数据表(假设每个月都出售某些。它可以更简洁(即计算上面shawnt示例中的日期),但写入显示所有我将它封装到一个存储过程中,因为它显示了显式传递的值。

    CREATE PROCEDURE DoTheThing 
    @startDate DATE  -- Should be Year-Month-01 or YearMonth01
    AS
    BEGIN
    DECLARE @yr INT, @mth INT,
        @yr2 INT, @mth2 INT,
        @endDate DATE   -- StartDt - 11 months

    -- if the date will be passed in with a day other than 01, add code here to set the day on the passed date to 01
    -- if only the high year and month are passed in, then create a @startDt value and continue.

    SET @endDate = DATEADD(MONTH, -11, @startDate)

    SELECT @yr = DATEPART(YEAR, @startDate),
        @mth = DATEPART(MONTH, @startdate),
        @yr2 = DATEPART(YEAR, @startDate),
        @mth2 = DATEPART(MONTH, @startdate)

    WITH mthYr AS (
        SELECT DISTINCT 
            YEAR, 
            MONTH
        FROM suchDB.muchUSER.awesomeTABLE   -- Get the data from the source table
        WHERE (
            YEAR = @yr              -- if in the passed-in year, then take all months less than or equal to the start month
            AND MONTH <= @mth
            )
            OR (
            YEAR = @yr2             -- if the period is Jan -- Dec in one year, this reiterates the above
            AND MONTH >= @mth2      -- if not, select the months in the second year where the month is greater than or equal to the calculated month
            )
        ), 
    prods AS (
        SELECT DISTINCT product     -- Return a list of products sold during the year.
        FROM suchDB.muchUSER.awesomeTABLE smt
        INNER JOIN mthYr
            ON mthYr.YEAR = smt.YEAR
            AND mthyr.MONTH = smt.MONTH
        )

    SELECT MAX(mthYr.YEAR) AS year,     -- current report only shows passed in year/month value
        MAX(mthYr.month) AS month, 
        prods.product,                  
        AVG(ISNULL(VALUE, 0.00)) average,   -- isNull value adds a zero into the list to be averaged
        SUM(ISNULL(VALUE, 0.00)) SUM        -- not really necessary, but no warnings about NULL values will be generated
    FROM mthYr CROSS JOIN prods         -- cross join of the two lists means all of the products sold will have a value for each month
    LEFT JOIN suchDB.muchUSER.awesomeTABLE smt  -- left join so missing productMonths will still be added in
        ON smt.YEAR = mthYr.year
        AND smt.MONTH = mthYr.month 
        AND prods.product = smt.product
    ORDER BY smt.product
END

答案 1 :(得分:1)

如果你有一个产品表和一个日期表,我会把上面的查询加入,以便表示所有产品和所有日期,然后用你拥有的方法对结果求和。

您的第一颗子弹应该被解决,因为所有日期和所有产品都将被代表

我相信你的第二颗子弹是由“前面的第11行”真正获得过去的11个值而不管日期;这可以通过将0替换为当前缺失的月份/产品组合的null来解决,但是在左连接方法的情况下将提供。

答案 2 :(得分:1)

对于缺失的月份有很多可能的修复方法。这是一种不会过多改变原始查询的方法:

WITH ym as (
    select
         year(dateadd(month, n, cast('<year>' + '<month>' + '01' as date))) as y,
        month(dateadd(month, n, cast('<year>' + '<month>' + '01' as date))) as m
    from (values
         (0), (-1), (-2), (-3),  (-4),  (-5),
        (-6), (-7), (-8), (-9), (-10), (-11)
    ) ofs(n)
)
SELECT
    y, m, p.product,
    AVG(coalesce(value, 0)) OVER (
      PARTITION BY p.product 
      ORDER BY y, m
      ROWS 11 PRECEDING
    ) as average,
    SUM(value) OVER (
      PARTITION BY p.product 
      ORDER BY y, m
      ROWS 11 PRECEDING
    ) as sum
FROM
    ym cross join
    (select distinct product from suchDB.muchUSER.awesomeTABLE) p
    left outer join suchDB.muchUSER.awesomeTABLE t
        on t."year" = ym.y and t."month" = ym.m and t.product = p.product
where y = <insert year> and m = <month>

而不是最后的where,您可以order by y, m descselect top 1。我通常会尽量避免使用top,但我不确定你是如何放弃参数的,并且避免使用某些编程库两次引用它们可能不那么麻烦。即使你手工完成,你仍然需要记住在长时间的查询中查看两个地方。

由于它似乎只需要一个月的聚合,我认为你可以使用这个不使用窗口函数的简单版本:

WITH ym as (
    select
        dateadd(month, n, cast('<year>' + '<month>' + '01' as date) as dt,
         year(dateadd(month, n, cast('<year>' + '<month>' + '01' as date))) as y,
        month(dateadd(month, n, cast('<year>' + '<month>' + '01' as date))) as m
    from (values
         (0), (-1), (-2), (-3),  (-4),  (-5),
        (-6), (-7), (-8), (-9), (-10), (-11)
    ) ofs(n)
)
SELECT
    year(max(dt)) as "year", month(max(dt)) as "month", p.product,
    AVG(coalesce(value, 0)) as average,
    SUM(value) as "sum"
FROM
    ym cross join
    (select distinct product from awesome) p
    left outer join awesome t
    on t."year" = ym.y and t."month" = ym.m and t.product = p.product
GROUP BY p.product

我不知道你是否只想限制去年实际销售的产品,所以我没有处理那些产品。

如果你开始考虑如何使它更通用和可重复使用,你可能会更像这个。我继续将产品限制添加到前一年的活动中:

WITH dt as (
    select dateadd(month, n, cast('<year>' + '<month>' + '01' as date) as dt
),
ym as (
    select dt, year(dateadd(month, n, dt)) as y, month(dateadd(month, n, dt)) as m
    from (values
         (0), (-1), (-2), (-3),  (-4),  (-5),
        (-6), (-7), (-8), (-9), (-10), (-11)
    ) ofs(n), dt
)
SELECT
    year(max(dt)) as "year", month(max(dt)) as "month", p.product,
    AVG(coalesce(value, 0)) as average, SUM(value) as "sum"
FROM
    ym cross join
    (
        select distinct product from awesome
        where cast("year" + "month" + '01' as date) between
                (select min(dt) from ym) and (select max(dt) from ym)
    ) p
    left outer join (
        select distinct /* get the latest "version" only */
            first_value("year")
                over (partition by "year", "month", product order by version desc),
            first_value("month")
                over (partition by "year", "month", product order by version desc),
            product,
            first_value(value)
                over (partition by "year", "month", product order by version desc)
        from awesome
    ) t
        on t."year" = ym.y and t."month" = ym.m and t.product = p.product
GROUP BY p.product

最终查询还会尝试仅针对最新版本处理您的过滤器。但是,first_value()函数需要更高版本的SQL Server。

答案 3 :(得分:1)

使用日期时要做的第一件事是拥有一个日期时间字段,而不是年,月,日的分隔字段

SELECT year, month, product, value
     , DATEFROMPARTS(year, month, 1) fullDate
FROM   suchDB.muchUSER.awesomeTABLE

由于行可以进行版本控制,我们需要获取每年,每月和每月产品的最新版本,这可以通过多种方式完成,例如使用窗口函数或自动连接,例如后者是

SELECT b.[year], b.[month], b.[product], [value]
     , DATEFROMPARTS(b.[year], b.month, 1) fullDate
FROM   suchDB.muchUSER.awesomeTABLE b
       INNER JOIN (SELECT [year], [month], [product], max([version]) lv
                   FROM   suchDB.muchUSER.awesomeTABLE
                   GROUP BY [year], [month], [product]
                  ) m ON b.[year] = m.year AND b.month = m.month 
                     AND b.product = m.product AND b.[version] = m.lv

同时拥有产品清单会很方便

SELECT DISTINCT product
FROM   suchDB.muchUSER.awesomeTABLE

获取最后一年的数据可以通过多种方式完成,我喜欢CROSS APPLY

Declare @_year int
Declare @_month int

Set @_year = 2015
Set @_month = 11

;With baseDate AS (
  SELECT b.[year], b.[month], b.[product], [value]
       , DATEFROMPARTS(b.[year], b.month, 1) fullDate
  FROM   suchDB.muchUSER.awesomeTABLE b
         INNER JOIN (SELECT [year], [month], [product], max([version]) lv
                     FROM   suchDB.muchUSER.awesomeTABLE
                     GROUP BY [year], [month], [product]
                    ) m ON b.[year] = m.year AND b.month = m.month 
                       AND b.product = m.product AND b.[version] = m.lv
), Products AS (
  SELECT DISTINCT [product]
  FROM   suchDB.muchUSER.awesomeTABLE
)
SELECT @_year [Year], @_month [Month], p.[product]
     , ly.Average
     , ly.[Sum]
FROM   Products p
       CROSS APPLY (SELECT Sum(lastYear.Value) / 12.0 Average
                         , Sum(lastYear.Value) [Sum]
                    FROM   baseDate lastYear
                    WHERE  lastYear.fullDate > DATEFROMPARTS(@_year - 1, @_month, 1)
                      AND  lastYear.fullDate <= DATEFROMPARTS(@_year, @_month, 1)
                      AND  lastYear.product = p.product
                   ) ly 
WHERE  ly.[Sum] IS NOT NULL

WHERE条件会删除参数

前一年中没有行的产品

要删除变量并获取每月的值,需要一个日历表。如果表格中的所有产品都在那里,我们可以使用DISTINCT获取日期,使用第一个CTE代替表格,让我们也获得完整的日期

SELECT DISTINCT [year], [month], fullDate
FROM   baseDate

否则,有不同的方法来创建日历表。我们可以将日历表添加到主查询CTE中,并在CROSS APPLY中使用它而不是变量

;With baseDate AS (
  SELECT b.[year] _y, b.[month] _m, b.[product], [value]
       , DATEFROMPARTS(b.[year], b.month, 1) fullDate
  FROM   suchDB.muchUSER.awesomeTABLE b
         INNER JOIN (SELECT [year], [month], [product], max([version]) lv
                     FROM   suchDB.muchUSER.awesomeTABLE
                     GROUP BY [year], [month], [product]
                    ) m ON b.[year] = m.year AND b.month = m.month 
                       AND b.product = m.product AND b.[version] = m.lv
), Products AS (
  SELECT DISTINCT [product]
  FROM   suchDB.muchUSER.awesomeTABLE
), Months As (
  SELECT DISTINCT _y, _m, fullDate
  FROM   baseDate
)
SELECT _y [Year], _m [Month], p.[product]
     , ly.Average
     , ly.[Sum]
     , ly.[Count]
FROM   Products p
       CROSS APPLY (SELECT m._y, m._m
                         , Sum(bd.Value) / 12.0 Average
                         , Sum(bd.Value) [Sum]
                         , Count(Value) [Count]
                    FROM   Months m
                           LEFT JOIN baseDate bd 
                                  ON bd.fullDate > DATEADD(YY, -1, m.fullDate)
                                 AND bd.fullDate <= m.fullDate
                    WHERE  bd.product = p.product
                    GROUP BY m._y, m._m
                   ) ly 
WHERE  ly.[Sum] IS NOT NULL