我想计算在几周内具有负值的2个或更多连续周周期的数量。
示例:
Week | Value
201301 | 10
201302 | -5 <--| both weeks have negative values and are consecutive
201303 | -6 <--|
Week | Value
201301 | 10
201302 | -5
201303 | 7
201304 | -2 <-- negative but not consecutive to the last negative value in 201302
Week | Value
201301 | 10
201302 | -5
201303 | -7
201304 | -2 <-- 1st group of negative and consecutive values
201305 | 0
201306 | -12
201307 | -8 <-- 2nd group of negative and consecutive values
除了使用游标和重置变量并按顺序检查每一行之外,还有更好的方法吗?
以下是我尝试测试的一些SQL:
IF OBJECT_ID('TempDB..#ConsecutiveNegativeWeekTestOne') IS NOT NULL DROP TABLE #ConsecutiveNegativeWeekTestOne
IF OBJECT_ID('TempDB..#ConsecutiveNegativeWeekTestTwo') IS NOT NULL DROP TABLE #ConsecutiveNegativeWeekTestTwo
CREATE TABLE #ConsecutiveNegativeWeekTestOne
(
[Week] INT NOT NULL
,[Value] DECIMAL(18,6) NOT NULL
)
-- I have a condition where I expect to see at least 2 consecutive weeks with negative values
-- TRUE : Week 201328 & 201329 are both negative.
INSERT INTO #ConsecutiveNegativeWeekTestOne
VALUES
(201327, 5)
,(201328,-11)
,(201329,-18)
,(201330, 25)
,(201331, 30)
,(201332, -36)
,(201333, 43)
,(201334, 50)
,(201335, 59)
,(201336, 0)
,(201337, 0)
SELECT * FROM #ConsecutiveNegativeWeekTestOne
WHERE Value < 0
ORDER BY [Week] ASC
CREATE TABLE #ConsecutiveNegativeWeekTestTwo
(
[Week] INT NOT NULL
,[Value] DECIMAL(18,6) NOT NULL
)
-- FALSE: The negative weeks are not consecutive
INSERT INTO #ConsecutiveNegativeWeekTestTwo
VALUES
(201327, 5)
,(201328,-11)
,(201329,20)
,(201330, -25)
,(201331, 30)
,(201332, -36)
,(201333, 43)
,(201334, 50)
,(201335, -15)
,(201336, 0)
,(201337, 0)
SELECT * FROM #ConsecutiveNegativeWeekTestTwo
WHERE Value < 0
ORDER BY [Week] ASC
我的SQL小提琴也在这里: http://sqlfiddle.com/#!3/ef54f/2
答案 0 :(得分:3)
首先,请您分享计算周数的公式,或者提供每周的实际日期,或者某种方法来确定任何特定年份是否有52周或53周?一旦你这样做,我可以使我的查询正确地跳过缺失的数据和跨年界限。
现在查询:这可以在没有JOIN
的情况下完成,这取决于存在的确切索引,可以比使用JOINs
的任何解决方案提高性能。然后,它可能不会。这也很难理解,如果其他解决方案表现得足够好(特别是当存在正确的索引时),可能不值得。
模拟PREORDER BY
窗口函数(尊重间隙,忽略年份边界):
WITH Calcs AS (
SELECT
Grp =
[Week] -- comment out to ignore gaps and gain year boundaries
-- Row_Number() OVER (ORDER BY [Week]) -- swap with previous line
- Row_Number() OVER
(PARTITION BY (SELECT 1 WHERE Value < 0) ORDER BY [Week]),
*
FROM dbo.ConsecutiveNegativeWeekTestOne
)
SELECT
[Week] = Min([Week])
-- NumWeeks = Count(*) -- if you want the count
FROM Calcs C
WHERE Value < 0
GROUP BY C.Grp
HAVING Count(*) >= 2
;
另一种方法是,使用LAG
模拟LEAD
和CROSS JOIN
并聚合(尊重差距,忽略年份边界):
WITH Groups AS (
SELECT
Grp = T.[Week] + X.Num,
*
FROM
dbo.ConsecutiveNegativeWeekTestOne T
CROSS JOIN (VALUES (-1), (0), (1)) X (Num)
)
SELECT
[Week] = Min(C.[Week])
-- Value = Min(C.Value)
FROM
Groups G
OUTER APPLY (SELECT G.* WHERE G.Num = 0) C
WHERE G.Value < 0
GROUP BY G.Grp
HAVING
Min(G.[Week]) = Min(C.[Week])
AND Max(G.[Week]) > Min(C.[Week])
;
而且,我原来的第二个查询,但是简化(忽略差距,处理年份边界):
WITH Groups AS (
SELECT
Grp = (Row_Number() OVER (ORDER BY T.[Week]) + X.Num) / 3,
*
FROM
dbo.ConsecutiveNegativeWeekTestOne T
CROSS JOIN (VALUES (0), (2), (4)) X (Num)
)
SELECT
[Week] = Min(C.[Week])
-- Value = Min(C.Value)
FROM
Groups G
OUTER APPLY (SELECT G.* WHERE G.Num = 2) C
WHERE G.Value < 0
GROUP BY G.Grp
HAVING
Min(G.[Week]) = Min(C.[Week])
AND Max(G.[Week]) > Min(C.[Week])
;
注意:这些的执行计划可能比其他查询更昂贵,但只有1个表访问而不是2或3,而CPU可能更高,但仍然相当低。
注意:我最初并没注意每组负值只生成一行,因此我将此查询生成为只需要2个表访问(尊重差距,忽略年份边界):
SELECT
T1.[Week]
FROM
dbo.ConsecutiveNegativeWeekTestOne T1
WHERE
Value < 0
AND EXISTS (
SELECT *
FROM dbo.ConsecutiveNegativeWeekTestOne T2
WHERE
T2.Value < 0
AND T2.[Week] IN (T1.[Week] - 1, T1.[Week] + 1)
)
;
但是,我现在已将其修改为按要求执行,仅显示每个开始日期(尊重差距,忽略年份边界):
SELECT
T1.[Week]
FROM
dbo.ConsecutiveNegativeWeekTestOne T1
WHERE
Value < 0
AND EXISTS (
SELECT *
FROM
dbo.ConsecutiveNegativeWeekTestOne T2
WHERE
T2.Value < 0
AND T1.[Week] - 1 <= T2.[Week]
AND T1.[Week] + 1 >= T2.[Week]
AND T1.[Week] <> T2.[Week]
HAVING
Min(T2.[Week]) > T1.[Week]
)
;
最后,只是为了好玩,这是使用LEAD
和LAG
的SQL Server 2012及更高版本:
WITH Weeks AS (
SELECT
PrevValue = Lag(Value, 1, 0) OVER (ORDER BY [Week]),
SubsValue = Lead(Value, 1, 0) OVER (ORDER BY [Week]),
PrevWeek = Lag(Week, 1, 0) OVER (ORDER BY [Week]),
SubsWeek = Lead(Week, 1, 0) OVER (ORDER BY [Week]),
*
FROM
dbo.ConsecutiveNegativeWeekTestOne
)
SELECT @Week = [Week]
FROM Weeks W
WHERE
(
[Week] - 1 > PrevWeek
OR PrevValue >= 0
)
AND Value < 0
AND SubsValue < 0
AND [Week] + 1 = SubsWeek
;
我不确定我这样做是最好的方式,因为我没有使用过这么多,但它仍然有效。
您应该对呈现给您的各种查询进行一些性能测试,并根据以下顺序选择最佳的查询:
看到我的一些解决方案不是很清楚,其他足够快且简洁的解决方案可能会在竞争中胜出,而这些解决方案将在您自己的生产代码中使用。但是......也许不是!也许有人会欣赏这些技术,即使它们不能用作这个时间。
让我们做一些测试,看看这一切的真相是什么!这是一些测试设置脚本。它将在您自己的服务器上生成与我的相同的数据:
IF Object_ID('dbo.ConsecutiveNegativeWeekTestOne', 'U') IS NOT NULL DROP TABLE dbo.ConsecutiveNegativeWeekTestOne;
GO
CREATE TABLE dbo.ConsecutiveNegativeWeekTestOne (
[Week] int NOT NULL CONSTRAINT PK_ConsecutiveNegativeWeekTestOne PRIMARY KEY CLUSTERED,
[Value] decimal(18,6) NOT NULL
);
SET NOCOUNT ON;
DECLARE
@f float = Rand(5.1415926535897932384626433832795028842),
@Dt datetime = '17530101',
@Week int;
WHILE @Dt <= '20140106' BEGIN
INSERT dbo.ConsecutiveNegativeWeekTestOne
SELECT
Format(@Dt, 'yyyy') + Right('0' + Convert(varchar(11), DateDiff(day, DateAdd(year, DateDiff(year, 0, @Dt), 0), @Dt) / 7 + 1), 2),
Rand() * 151 - 76
;
SET @Dt = DateAdd(day, 7, @Dt);
END;
这将生成13,620周,从175301到201401.我修改了所有查询以选择Week
值而不是计数,格式为SELECT @Week = Expression ...
,以便测试不会受到返回行的影响客户。
我只测试了与差距相关的非年边界处理版本。
<强>结果
Query Duration CPU Reads
------------------ -------- ----- ------
ErikE-Preorder 27 31 40
ErikE-CROSS 29 31 40
ErikE-Join-IN -------Awful---------
ErikE-Join-Revised 46 47 15069
ErikE-Lead-Lag 104 109 40
jods 12 16 120
Transact Charlie 12 16 120
<强>结论强>
非JOIN版本的读取次数减少不足以保证其复杂性增加。
该表太小,性能几乎无关紧要。 261周的时间是微不足道的,因此即使查询不好,正常的业务运营也不会出现任何性能问题。
我使用Week
上的索引进行了测试(这非常合理),使用搜索执行两个单独的JOIN
远远超过任何试图获取相关的相关数据一举夺冠。查理和jods在他们的评论中被发现了。
此数据不足以暴露CPU和持续时间中查询之间的实际差异。上述值是代表性的,但有时31 ms为16 ms,16 ms为0 ms。由于分辨率约为15毫秒,这并没有告诉我们多少。
我棘手的查询技术确实表现得更好。在性能危急情况下,它们可能是值得的。但这不是其中之一。
领先优势和滞后可能并不总能获胜。查找值上存在索引可能决定了这一点。即使按值排序不是顺序的,仍然可以根据特定顺序提取上一个/下一个值的能力可能是这些功能的一个很好的用例。
答案 1 :(得分:1)
你可以使用EXISTS的组合。
假设你只想知道小组(连续几周的系列都是负面的)
- 找到潜在的开始周
;WITH starts as (
SELECT [Week]
FROM #ConsecutiveNegativeWeekTestOne AS s
WHERE s.[Value] < 0
AND NOT EXISTS (
SELECT 1
FROM #ConsecutiveNegativeWeekTestOne AS p
WHERE p.[Week] = s.[Week] - 1
AND p.[Value] < 0
)
)
SELECT COUNT(*)
FROM
Starts AS s
WHERE EXISTS (
SELECT 1
FROM #ConsecutiveNegativeWeekTestOne AS n
WHERE n.[Week] = s.[Week] + 1
AND n.[Value] < 0
)
如果您在Week上有索引,则此查询的效率应该适中。
答案 2 :(得分:1)
您可以使用自联接替换LEAD和LAG。
计数的想法基本上是计算负序的开始,而不是试图考虑每一行。
SELECT COUNT(*)
FROM ConsecutiveNegativeWeekTestOne W
LEFT OUTER JOIN ConsecutiveNegativeWeekTestOne Prev
ON W.week = Prev.week + 1
INNER JOIN ConsecutiveNegativeWeekTestOne Next
ON W.week = Next.week - 1
WHERE W.value < 0
AND (Prev.value IS NULL OR Prev.value > 0)
AND Next.value < 0
请注意,我只是做了“周+ 1”,这在年度变化时无效。