有没有更好的方法来识别序列模式的时间间隔边界?

时间:2011-04-22 16:35:42

标签: sql sql-server analysis aggregation

我有一张付款表,有正值和负值(即捕获和信用)。我需要确定自上次净正数以来我们收到净正数的点数。例如,如果客户进行这些付款并收到这些信用:

01/01  $100 <-
02/01 -$100
03/01 -$100
04/01  $100
05/01  $100
06/01  $100 <-

......那么积分将是01/01和06/01:从02/01到04/01,他们有负余额,截至05/01他们的余额为零。

我目前的方法是首先使用捕获建立所有日期的结束日期列表,然后计算每个日期的开始日期,最后计算这些期间的净捕获:

Start      End        NetCaptures
1900/01/01 2011/01/01  $100
2011/01/02 2011/04/01 -$100
2011/04/02 2011/05/01  $100
2011/05/02 2011/06/01  $100

然后我丢弃NetCaptures为$ 0或更少的记录,重新计算开始日期,重新计算网络捕获,并重复直到没有要删除的记录,留下这个。

Start      End        NetCaptures
1900/01/01 2011/01/01  $100
2011/01/02 2011/06/01  $100

有更好的方法吗?有些聪明地使用分析表达式?这已接近RBAR。在实践中,它运行速度可以接受(500K记录为10分钟,而在我开始以这种方式计算学分之前为1.5)。

*结果 *

虽然Microsoft确实支持优雅的滚动总功能,但是使用这个想法我最终得到了这样的代码:计算所有捕获,计算每个捕获的运行总计,并丢弃那些具有相同的早期记录的那些或者更高的总计。

CREATE TABLE #Sequences
    (
    OrderID INT NOT NULL,
    Sequence    INT NOT NULL,
    PRIMARY KEY (OrderID, Sequence),
    StartDate   DATE NOT NULL DEFAULT '1900-01-01',
    EndDate DATE NOT NULL,
    CapturesThisPeriod  DECIMAL(18, 2) NOT NULL DEFAULT 0.00,
    )
INSERT INTO #Sequences (OrderID, Sequence, EndDate)
    SELECT OrderID, ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY DateReceived), DateReceived
    FROM Receipts
    WHERE Amount > 0.00

/* Calculate the start date for each period */
UPDATE S
SET StartDate = DATEADD(D, 1, Prev.EndDate)
FROM
    #Sequences AS S
    INNER JOIN #Sequences AS Prev ON S.OrderID = Prev.OrderID AND Prev.Sequence = S.Sequence - 1

/* Calculate the cumulative total for each period */
UPDATE M
SET CumulativeReceipts = R.Receipts
FROM
    #Sequences AS M
    INNER JOIN      
        (
        SELECT
            M.OrderID, M.Sequence, SUM(R.Amount) AS Receipts
        FROM
            #Sequences AS M
            INNER JOIN Receipts AS R ON M.OrderID = R.OrderID AND R.DateReceived <= M.EndDate
        GROUP BY
            M.OrderID, M.Sequence
        ) AS R ON M.OrderID = R.OrderID AND M.Sequence = R.Sequence

/* Delete sequences with do not represent net positive receipts */
DELETE FROM M
FROM #Sequences AS M
WHERE EXISTS (SELECT * FROM #Sequences AS Prev WHERE M.OrderID = Prev.OrderID AND Prev.Sequence < M.Sequence AND Prev.CumulativeReceipts >= M.CumulativeReceipts)

/* Recalculate sequence numbers and dates */
UPDATE S SET Sequence = NewSequence FROM (SELECT Sequence, ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY Sequence) AS NewSequence FROM #Sequences) AS S
UPDATE S
SET StartDate = DATEADD(D, 1, Prev.EndDate)
FROM
    #Sequences AS S
    INNER JOIN #Sequences AS Prev ON S.OrderID = Prev.OrderID AND Prev.Sequence = S.Sequence - 1
    END

/* Calculate net captures per period, and continue with analysis */

1 个答案:

答案 0 :(得分:1)