用于检查2个或更多连续负周值的SQL

时间:2013-06-13 21:24:52

标签: sql sql-server-2008

我想计算在几周内具有负值的2个或更多连续周周期的数量。

示例:

Week | Value
201301 | 10
201302 | -5 <--| both weeks have negative values and are consecutive
201303 | -6 <--| 

 Week | Value
201301 | 10
201302 | -5 
201303 | 7
201304 | -2 <-- negative but not consecutive to the last negative value in 201302 

 Week | Value
201301 | 10
201302 | -5 
201303 | -7
201304 | -2 <-- 1st group of negative and consecutive values 
201305 | 0
201306 | -12
201307 | -8 <-- 2nd group of negative and consecutive values 

除了使用游标和重置变量并按顺序检查每一行之外,还有更好的方法吗?

以下是我尝试测试的一些SQL:

IF OBJECT_ID('TempDB..#ConsecutiveNegativeWeekTestOne') IS NOT NULL DROP TABLE #ConsecutiveNegativeWeekTestOne
IF OBJECT_ID('TempDB..#ConsecutiveNegativeWeekTestTwo') IS NOT NULL DROP TABLE #ConsecutiveNegativeWeekTestTwo

CREATE TABLE #ConsecutiveNegativeWeekTestOne
(
     [Week] INT NOT NULL
     ,[Value] DECIMAL(18,6) NOT NULL
)

-- I have a condition where I expect to see at least 2 consecutive weeks with negative values
-- TRUE : Week 201328 & 201329 are both negative.
INSERT INTO #ConsecutiveNegativeWeekTestOne
VALUES
(201327, 5)
,(201328,-11)
,(201329,-18)
,(201330, 25)
,(201331, 30)
,(201332, -36)
,(201333, 43)
,(201334, 50)
,(201335, 59)
,(201336, 0)
,(201337, 0)

SELECT * FROM #ConsecutiveNegativeWeekTestOne
WHERE Value < 0
ORDER BY [Week] ASC


CREATE TABLE #ConsecutiveNegativeWeekTestTwo
(
     [Week] INT NOT NULL
     ,[Value] DECIMAL(18,6) NOT NULL
)

-- FALSE: The negative weeks are not consecutive
INSERT INTO #ConsecutiveNegativeWeekTestTwo
VALUES

(201327, 5)
,(201328,-11)
,(201329,20)
,(201330, -25)
,(201331, 30)
,(201332, -36)
,(201333, 43)
,(201334, 50)
,(201335, -15)
,(201336, 0)
,(201337, 0)

SELECT * FROM #ConsecutiveNegativeWeekTestTwo
WHERE Value < 0
ORDER BY [Week] ASC

我的SQL小提琴也在这里: http://sqlfiddle.com/#!3/ef54f/2

3 个答案:

答案 0 :(得分:3)

首先,请您分享计算周数的公式,或者提供每周的实际日期,或者某种方法来确定任何特定年份是否有52周或53周?一旦你这样做,我可以使我的查询正确地跳过缺失的数据和跨年界限。

现在查询:这可以在没有JOIN的情况下完成,这取决于存在的确切索引,可以比使用JOINs的任何解决方案提高性能。然后,它可能不会。这也很难理解,如果其他解决方案表现得足够好(特别是当存在正确的索引时),可能不值得。

模拟PREORDER BY窗口函数(尊重间隙,忽略年份边界):

WITH Calcs AS (
   SELECT
      Grp =
         [Week] -- comment out to ignore gaps and gain year boundaries
         -- Row_Number() OVER (ORDER BY [Week]) -- swap with previous line
         - Row_Number() OVER
            (PARTITION BY (SELECT 1 WHERE Value < 0) ORDER BY [Week]),
      *
   FROM dbo.ConsecutiveNegativeWeekTestOne
)
SELECT
   [Week] = Min([Week])
   -- NumWeeks = Count(*) -- if you want the count
FROM Calcs C
WHERE Value < 0
GROUP BY C.Grp
HAVING Count(*) >= 2
;

See a Live Demo at SQL Fiddle(第一次查询)

另一种方法是,使用LAG模拟LEADCROSS JOIN并聚合(尊重差距,忽略年份边界):

WITH Groups AS (
   SELECT
      Grp = T.[Week] + X.Num,
      *
   FROM
      dbo.ConsecutiveNegativeWeekTestOne T
      CROSS JOIN (VALUES (-1), (0), (1)) X (Num)
)
SELECT
   [Week] = Min(C.[Week])
   -- Value = Min(C.Value)
FROM
   Groups G
   OUTER APPLY (SELECT G.* WHERE G.Num = 0) C
WHERE G.Value < 0
GROUP BY G.Grp
HAVING
   Min(G.[Week]) = Min(C.[Week])
   AND Max(G.[Week]) > Min(C.[Week])
;

See a Live Demo at SQL Fiddle(第二次查询)

而且,我原来的第二个查询,但是简化(忽略差距,处理年份边界):

WITH Groups AS (
   SELECT
      Grp = (Row_Number() OVER (ORDER BY T.[Week]) + X.Num) / 3,
      *
   FROM
      dbo.ConsecutiveNegativeWeekTestOne T
      CROSS JOIN (VALUES (0), (2), (4)) X (Num)
)
SELECT
   [Week] = Min(C.[Week])
   -- Value = Min(C.Value)
FROM
   Groups G
   OUTER APPLY (SELECT G.* WHERE G.Num = 2) C
WHERE G.Value < 0
GROUP BY G.Grp
HAVING
   Min(G.[Week]) = Min(C.[Week])
   AND Max(G.[Week]) > Min(C.[Week])
;

注意:这些的执行计划可能比其他查询更昂贵,但只有1个表访问而不是2或3,而CPU可能更高,但仍然相当低。

注意:我最初并没注意每组负值只生成一行,因此我将此查询生成为只需要2个表访问(尊重差距,忽略年份边界):

SELECT
   T1.[Week]
FROM
   dbo.ConsecutiveNegativeWeekTestOne T1
WHERE
   Value < 0
   AND EXISTS (
      SELECT *
      FROM dbo.ConsecutiveNegativeWeekTestOne T2
      WHERE
         T2.Value < 0
         AND T2.[Week] IN (T1.[Week] - 1, T1.[Week] + 1)
   )
;

See a Live Demo at SQL Fiddle(第3次查询)

但是,我现在已将其修改为按要求执行,仅显示每个开始日期(尊重差距,忽略年份边界):

SELECT
   T1.[Week]
FROM
   dbo.ConsecutiveNegativeWeekTestOne T1
WHERE
   Value < 0
   AND EXISTS (
      SELECT *
      FROM
         dbo.ConsecutiveNegativeWeekTestOne T2
      WHERE
         T2.Value < 0
         AND T1.[Week] - 1 <= T2.[Week]
         AND T1.[Week] + 1 >= T2.[Week]
         AND T1.[Week] <> T2.[Week]
      HAVING
         Min(T2.[Week]) > T1.[Week]
   )
;

See a Live Demo at SQL Fiddle(第3次查询)

最后,只是为了好玩,这是使用LEADLAG的SQL Server 2012及更高版本:

WITH Weeks AS (
   SELECT
      PrevValue = Lag(Value, 1, 0) OVER (ORDER BY [Week]),
      SubsValue = Lead(Value, 1, 0) OVER (ORDER BY [Week]),
      PrevWeek = Lag(Week, 1, 0) OVER (ORDER BY [Week]),
      SubsWeek = Lead(Week, 1, 0) OVER (ORDER BY [Week]),
      *
   FROM
     dbo.ConsecutiveNegativeWeekTestOne
)
SELECT @Week = [Week]
FROM Weeks W
WHERE
   (
      [Week] - 1 > PrevWeek
      OR PrevValue >= 0
   )
   AND Value < 0
   AND SubsValue < 0
   AND [Week] + 1 = SubsWeek
;

See a Live Demo at SQL Fiddle(第4次查询)

我不确定我这样做是最好的方式,因为我没有使用过这么多,但它仍然有效。

您应该对呈现给您的各种查询进行一些性能测试,并根据以下顺序选择最佳的查询:

  1. 正确
  2. 清除
  3. 简明
  4. 快速
  5. 看到我的一些解决方案不是很清楚,其他足够快且简洁的解决方案可能会在竞争中胜出,而这些解决方案将在您自己的生产代码中使用。但是......也许不是!也许有人会欣赏这些技术,即使它们不能用作这个时间。

    让我们做一些测试,看看这一切的真相是什么!这是一些测试设置脚本。它将在您自己的服务器上生成与我的相同的数据:

    IF Object_ID('dbo.ConsecutiveNegativeWeekTestOne', 'U') IS NOT NULL DROP TABLE dbo.ConsecutiveNegativeWeekTestOne;
    GO
    CREATE TABLE dbo.ConsecutiveNegativeWeekTestOne (
       [Week] int NOT NULL CONSTRAINT PK_ConsecutiveNegativeWeekTestOne PRIMARY KEY CLUSTERED,
       [Value] decimal(18,6) NOT NULL
    );
    
    SET NOCOUNT ON;
    
    DECLARE
       @f float = Rand(5.1415926535897932384626433832795028842),
       @Dt datetime = '17530101',
       @Week int;
    
    WHILE @Dt <= '20140106' BEGIN
       INSERT dbo.ConsecutiveNegativeWeekTestOne
       SELECT
          Format(@Dt, 'yyyy') + Right('0' + Convert(varchar(11), DateDiff(day, DateAdd(year, DateDiff(year, 0, @Dt), 0), @Dt) / 7 + 1), 2),
          Rand() * 151 - 76
       ;
       SET @Dt = DateAdd(day, 7, @Dt);
    END;
    

    这将生成13,620周,从175301到201401.我修改了所有查询以选择Week值而不是计数,格式为SELECT @Week = Expression ...,以便测试不会受到返回行的影响客户。

    我只测试了与差距相关的非年边界处理版本。

    <强>结果

                 Query  Duration  CPU    Reads
    ------------------  --------  -----  ------
        ErikE-Preorder   27        31       40
           ErikE-CROSS   29        31       40
         ErikE-Join-IN   -------Awful---------
    ErikE-Join-Revised   46        47    15069
        ErikE-Lead-Lag  104       109       40
                  jods   12        16      120
      Transact Charlie   12        16      120
    

    <强>结论

    1. 非JOIN版本的读取次数减少不足以保证其复杂性增加。

    2. 该表太小,性能几乎无关紧要。 261周的时间是微不足道的,因此即使查询不好,正常的业务运营也不会出现任何性能问题。

    3. 我使用Week上的索引进行了测试(这非常合理),使用搜索执行两个单独的JOIN远远超过任何试图获取相关的相关数据一举夺冠。查理和jods在他们的评论中被发现了。

    4. 此数据不足以暴露CPU和持续时间中查询之间的实际差异。上述值是代表性的,但有时31 ms为16 ms,16 ms为0 ms。由于分辨率约为15毫秒,这并没有告诉我们多少。

    5. 我棘手的查询技术确实表现得更好。在性能危急情况下,它们可能是值得的。但这不是其中之一。

    6. 领先优势和滞后可能并不总能获胜。查找值上存在索引可能决定了这一点。即使按值排序不是顺序的,仍然可以根据特定顺序提取上一个/下一个值的能力可能是这些功能的一个很好的用例。

答案 1 :(得分:1)

你可以使用EXISTS的组合。

假设你只想知道小组(连续几周的系列都是负面的)

- 找到潜在的开始周

;WITH starts as (
    SELECT [Week]
    FROM #ConsecutiveNegativeWeekTestOne AS s
    WHERE s.[Value] < 0
      AND NOT EXISTS (
        SELECT 1
        FROM #ConsecutiveNegativeWeekTestOne AS p
        WHERE p.[Week] = s.[Week] - 1
          AND p.[Value] < 0
        )
    )
SELECT COUNT(*)
FROM
    Starts AS s
    WHERE EXISTS (
        SELECT 1
        FROM #ConsecutiveNegativeWeekTestOne AS n
        WHERE n.[Week] = s.[Week] + 1
          AND n.[Value] < 0
        )

如果您在Week上有索引,则此查询的效率应该适中。

答案 2 :(得分:1)

您可以使用自联接替换LEAD和LAG。

计数的想法基本上是计算负序的开始,而不是试图考虑每一行。

SELECT COUNT(*)
FROM ConsecutiveNegativeWeekTestOne W
LEFT OUTER JOIN ConsecutiveNegativeWeekTestOne Prev
  ON W.week = Prev.week + 1
INNER JOIN ConsecutiveNegativeWeekTestOne Next
  ON W.week = Next.week - 1
WHERE W.value < 0 
  AND (Prev.value IS NULL OR Prev.value > 0)
  AND Next.value < 0

请注意,我只是做了“周+ 1”,这在年度变化时无效。