大日期范围内的最小/最大日期值,具体取决于值

时间:2019-07-10 13:38:14

标签: sql sql-server tsql gaps-and-islands

我正在查询客户数据的快照,其中包含快照日期,客户ID和该客户当天的“价值”。我使用LAG函数返回前几天的值,以了解是否存在跌落/上升/完全亏损/完整的新值(从£0到>£0)。

最终的结果是确定客户价值为0英镑的最小和最大日期。

最初,我尝试按客户和值对MIN(Date)和Max(Date)进行分组。但是,如果客户在不同的日期范围内降至0英镑,则会带回最新日期范围的最大值和最早日期的最小值,而不是理想的最小值-带回两个范围均为0英镑的范围。

我尝试使用DENSE_RANK()拆分客户的每个值,但是这样做只是将所有£0值都排在同一等级。

下面是一些示例代码,向您显示我正在使用的数据以及我如何尝试对其进行拆分:

DROP TABLE IF EXISTS #SnapshotTable
CREATE TABLE #SnapshotTable
(
    Row_ID INT IDENTITY(1,1)
    ,SnapshotDate DATE
    ,SnapshotDateKey INT
    ,CustomerId INT
    ,Value DECIMAL(18,2)
)
INSERT INTO #SnapshotTable (SnapshotDate, SnapshotDateKey, CustomerId, Value)
SELECT '2019-01-01', 20190101, 1, 0.00
UNION SELECT '2019-01-02', 20190102, 1, 0.00
UNION SELECT '2019-01-03', 20190103, 1, 5.00
UNION SELECT '2019-01-04', 20190104, 1, 5.00
UNION SELECT '2019-01-05', 20190105, 1, 3.00
UNION SELECT '2019-01-06', 20190106, 1, 3.00
UNION SELECT '2019-01-07', 20190107, 1, 0.00
UNION SELECT '2019-01-08', 20190108, 1, 0.00
UNION SELECT '2019-01-09', 20190109, 1, 10.00
UNION SELECT '2019-01-10', 20190110, 1, 0.00

SELECT * FROM #SnapshotTable

-- Code that doesn't work correctly
SELECT
    CustomerId
    ,Value
    ,MinDate = MIN(SnapshotDateKey)
    ,MaxDate = MAX(SnapshotDateKey)
FROM #SnapshotTable
GROUP BY
    CustomerId
    ,Value

-- Attempted with dense rank
ALTER TABLE #SnapshotTable
ADD DenseRankTest INT NULL
GO
-- Update with Dense Rank
UPDATE TGT
SET 
    TGT.DenseRankTest = SRC.NewRank
FROM #SnapshotTable TGT
INNER JOIN (SELECT
                Row_ID
                ,NewRank = DENSE_RANK() OVER (PARTITION BY CustomerId ORDER BY Value ASC)
            FROM #SnapshotTable

            ) AS SRC
    ON SRC.Row_ID = TGT.Row_ID 

SELECT * FROM #SnapshotTable

现在,我可以看到density_rank()函数正在按我希望的方式运行,但老实说,我已经看了一段时间了,我无法正确地做到这一点。

有人可以建议我需要做什么吗?

我希望看到:

SELECT [StartDateKey] = 20190101, [EndDateKey] = 20190102, [CustomerId] = 1, [Value] = 0
UNION SELECT [StartDateKey] = 20190103, [EndDateKey] = 20190104, [CustomerId] = 1, [Value] = 5
UNION SELECT [StartDateKey] = 20190105, [EndDateKey] = 20190106, [CustomerId] = 1, [Value] = 3
UNION SELECT [StartDateKey] = 20190107, [EndDateKey] = 20190108, [CustomerId] = 1, [Value] = 0
UNION SELECT [StartDateKey] = 20190109, [EndDateKey] = 20190109, [CustomerId] = 1, [Value] = 10
UNION SELECT [StartDateKey] = 20190120, [EndDateKey] = 20190110, [CustomerId] = 1, [Value] = 0

编辑:对于那些偶然发现此问题的人,在这里的人们的帮助下,我发现了this as a good read for understanding the issue/solving the issue.

1 个答案:

答案 0 :(得分:2)

这是一个孤岛问题。但是,对声称的副本的公认答案根本不是解决此问题的最佳方法。而且投票率更高的答案仍然过于复杂。

更简单的方法是:

select customerid, value, min(SnapshotDateKey), max(SnapshotDateKey)
from (select st.*,
             row_number() over (partition by customerid, value order by snapshotdate) as seqnum
      from snapshottable st
     ) st
group by dateadd(day, -seqnum, snapshotdate), customerid, value
order by min(SnapshotDateKey);

Here是db <>小提琴。