优化SQL查询以删除游标

时间:2011-04-27 19:11:31

标签: sql sql-server-2005 optimization

我正在尝试编写一个查询,该查询将通过表格并将帐户中的任何信用额应用于最旧的余额。我不知道如何在不使用游标的情况下做到这一点,我知道如果可能的话应该不惜一切代价避免使用游标,所以我来这里寻求帮助。

select * into #balances from [IDAT_AR_BALANCES] where amount > 0
select * into #credits from [IDAT_AR_BALANCES] where amount < 0

create index ba_ID on #balances (CLIENT_ID)
create index cr_ID on #credits (CLIENT_ID)

declare credit_cursor cursor for
select [CLIENT_ID], amount, cvtGUID from #credits

open credit_cursor
declare @client_id varchar(11)
declare @credit money
declare @balance money
declare @cvtGuidBalance uniqueidentifier
declare @cvtGuidCredit uniqueidentifier
fetch next from credit_cursor into @client_id, @credit, @cvtGuidCredit
while @@fetch_status = 0
begin
      while(@credit < 0 and (select count(*) from #balances where @client_id = CLIENT_ID and amount <> 0) > 0)
      begin
            select top 1  @balance = amount, @cvtGuidBalance = cvtGuid from #balances where @client_id = CLIENT_ID and amount <> 0 order by AGING_DATE
            set @credit = @balance + @credit
            if(@credit > 0)
            begin
                  update #balances set amount = @credit where cvtGuid = @cvtGuidBalance
                  set @credit = 0
            end
            else
            begin
                  update #balances set amount = 0 where cvtGuid = @cvtGuidBalance
            end
      end
      update #credits set amount = @credit where cvtGuid = @cvtGuidCredit
      fetch next from credit_cursor into @client_id, @credit, @cvtGuidCredit
end

close credit_cursor
deallocate credit_cursor

delete #balances where AMOUNT = 0
delete #credits where AMOUNT = 0

truncate table [IDAT_AR_BALANCES]

insert [IDAT_AR_BALANCES] select * from #balances
insert [IDAT_AR_BALANCES] select * from #credits

drop table #balances
drop table #credits

在10000个记录和1000个客户端的测试用例中,运行需要26秒,通过在CLIENT_ID上添加两个索引,我可以将数字降低到14秒。然而,对于我需要的东西,这仍然太慢,最终结果可能有多达10000个客户端和超过4,000,000条记录,因此运行时间很容易变成两位数分钟。

我将非常感谢任何有关如何重组此内容以删除光标的建议。

示例(更新以显示您在运行后可以获得多个积分):

before
cvtGuid      client_id      ammount     AGING_DATE
xxxxxx       1              20.00       1/1/2011
xxxxxx       1              30.00       1/2/2011
xxxxxx       1              -10.00      1/3/2011
xxxxxx       1              5.00        1/4/2011
xxxxxx       2              20.00       1/1/2011
xxxxxx       2              15.00       1/2/2011
xxxxxx       2              -40.00      1/3/2011
xxxxxx       2              5.00        1/4/2011
xxxxxx       3              10.00       1/1/2011
xxxxxx       3              -20.00      1/2/2011
xxxxxx       3              5.00        1/3/2011
xxxxxx       3              -8.00       1/4/2011

after
cvtGuid      client_id      ammount     AGING_DATE
xxxxxx       1              10.00       1/1/2011
xxxxxx       1              30.00       1/2/2011
xxxxxx       1              5.00        1/4/2011
xxxxxx       3              -5.00       1/2/2011
xxxxxx       3              -8.00       1/4/2011

因此,它会将负面信用额应用于最早的正余额(示例中的客户端1),如果在完成后没有剩余的正余额,则会留下剩余的负数(客户端3 ),如果他们完全取消(这是90%的时间与真实数据的情况),它将完全删除记录(客户端2 )。

5 个答案:

答案 0 :(得分:4)

可以通过递归CTE解决这个问题。

基本理念是:

  1. 分别为每个帐户(client_id)获取正值和负值的总和。

  2. 对每个帐户进行迭代,然后根据amount的符号和绝对值“捏掉”两个总数中的一个,即相应的总数超过它的当前价值)。应从amount添加/减去相同的值。

  3. 更新后,删除amount已变为0的行。

  4. 对于我的解决方案,我借用了Lieven的表变量定义(谢谢!),添加了一列(cvtGuid,为演示目的声明为int)和一行(最后一行)原始例子中的一个,这是列文的剧本遗漏的。)

    /* preparing the demonstration data */
    DECLARE @IDAT_AR_BALANCES TABLE (
      cvtGuid int IDENTITY,
      client_id INTEGER
      , amount FLOAT
      , date DATE
    );
    INSERT INTO @IDAT_AR_BALANCES
      SELECT 1, 20.00, '1/1/2011'
      UNION ALL SELECT 1, 30.00, '1/2/2011'
      UNION ALL SELECT 1, -10.00, '1/3/2011'
      UNION ALL SELECT 1, 5.00, '1/4/2011'
      UNION ALL SELECT 2, 20.00, '1/1/2011'
      UNION ALL SELECT 2, 15.00, '1/2/2011'
      UNION ALL SELECT 2, -40.00, '1/3/2011'
      UNION ALL SELECT 2, 5.00, '1/4/2011'
      UNION ALL SELECT 3, 10.00, '1/1/2011'
      UNION ALL SELECT 3, -20.00, '1/2/2011'
      UNION ALL SELECT 3, 5.00, '1/3/2011'
      UNION ALL SELECT 3, -8.00, '1/4/2011';
    
    /* checking the original contents */
    SELECT * FROM @IDAT_AR_BALANCES;
    
    /* getting on with the job: */
    WITH totals AS (
      SELECT
        /* 1) preparing the totals */
        client_id,
        total_pos = SUM(CASE WHEN amount > 0 THEN amount END),
        total_neg = SUM(CASE WHEN amount < 0 THEN amount END)
      FROM @IDAT_AR_BALANCES
      GROUP BY client_id
    ),
    refined AS (
      /* 2) refining the original data with auxiliary columns:
         * rownum - row numbers (unique within accounts);
         * amount_to_discard_pos - the amount to discard `amount` completely if it's negative;
         * amount_to_discard_neg - the amount to discard `amount` completely if it's positive
      */
      SELECT
        *,
        rownum = ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY date),
        amount_to_discard_pos = CAST(CASE WHEN amount < 0 THEN -amount ELSE 0 END AS float),
        amount_to_discard_neg = CAST(CASE WHEN amount > 0 THEN -amount ELSE 0 END AS float)
      FROM @IDAT_AR_BALANCES
    ),
    prepared AS (
      /* 3) preparing the final table (using a recursive CTE) */
      SELECT
        cvtGuid = CAST(NULL AS int),
        client_id,
        amount = CAST(NULL AS float),
        date = CAST(NULL AS date),
        amount_update = CAST(NULL AS float),
        running_balance_pos = total_pos,
        running_balance_neg = total_neg,
        rownum = CAST(0 AS bigint)
      FROM totals
      UNION ALL
      SELECT
        n.cvtGuid,
        n.client_id,
        n.amount,
        n.date,
        amount_update = CAST(
          CASE
            WHEN n.amount_to_discard_pos < p.running_balance_pos
            THEN n.amount_to_discard_pos
            ELSE p.running_balance_pos
          END
          +
          CASE
            WHEN n.amount_to_discard_neg > p.running_balance_neg
            THEN n.amount_to_discard_neg
            ELSE p.running_balance_neg
          END
        AS float),
        running_balance_pos = CAST(p.running_balance_pos -
          CASE
            WHEN n.amount_to_discard_pos < p.running_balance_pos
            THEN n.amount_to_discard_pos
            ELSE p.running_balance_pos
          END
        AS float),
        running_balance_neg = CAST(p.running_balance_neg -
          CASE
            WHEN n.amount_to_discard_neg > p.running_balance_neg
            THEN n.amount_to_discard_neg
            ELSE p.running_balance_neg
          END
        AS float),
        n.rownum
      FROM refined n
        INNER JOIN prepared p ON n.client_id = p.client_id AND n.rownum = p.rownum + 1
    )
    /*                  -- some junk that I've forgotten to clean up,
    SELECT *            -- which you might actually want to use
    FROM prepared       -- to view the final prepared result set
    WHERE rownum > 0    -- before actually running the update
    ORDER BY client_id, rownum
    */
    /* performing the update */
    UPDATE t
    SET amount = t.amount + u.amount_update
    FROM @IDAT_AR_BALANCES t INNER JOIN prepared u ON t.cvtGuid = u.cvtGuid
    OPTION (MAXRECURSION 0);
    
    /* checking the contents after UPDATE */
    SELECT * FROM @IDAT_AR_BALANCES;
    
    /* deleting the eliminated amounts */
    DELETE FROM @IDAT_AR_BALANCES WHERE amount = 0;
    
    /* checking the contents after DELETE */
    SELECT * FROM @IDAT_AR_BALANCES;
    

    <强>更新

    正如Lieven正确建议的那样(再次感谢!),您可以删除amount首先添加0的帐户中的所有行,然后更新其他行。这将提高整体表现,因为正如你所说,大多数数据的数量加起来都是0。

    以下是Lieven删除“零账户”的解决方案的变体:

    DELETE FROM @IDAT_AR_BALANCES
    WHERE client_id IN (
      SELECT client_id
      FROM @IDAT_AR_BALANCES
      GROUP BY client_id
      HAVING SUM(amount) = 0
    )
    

    请注意,仍然需要更新后的DELETE,因为更新可能会将某些amount值重置为0.如果我是你,我可能会考虑创建一个触发器FOR UPDATE,它将自动删除amount = 0所在的行。这样的解决方案并不总是可以接受,但有时候还可以。这取决于您可以对数据做些什么。它也可能取决于它是否仅仅是你的项目还是其他维护者(他们不喜欢行'魔术'并且意外地消失了)。

答案 1 :(得分:2)

我最近把一些非常相似的东西放在一起。我没有找到一个非常简单的解决方案,它最终需要几百行,但我可以提供几点。

您可以将您的积分放入一个包含每个客户的序列号的表格中:

CREATE TABLE #CreditsInSequence
  (
  Client_ID INT NOT NULL,
  Sequence  INT NOT NULL,
  PRIMARY KEY (ClientID, Sequence),
  Date      DATE NOT NULL,
  Amount    DECIMAL NOT NULL
  )
INSERT INTO #CreditsInSequence (Client_ID, Sequence, Date, Amount)
  SELECT
    client_id, ROW_NUMBER (PARTITION BY client_id, ORDER BY date) AS Sequence, date, amount
  FROM
    #credits

如果一个客户只有一个信用,他们将在表中有一行,Sequence = 1.如果另一个客户有三个信用,他们将有三行,序列号为1,2和3。现在可以遍历此临时表,并且您只需要等待任何单个客户端拥有的最多信用的迭代次数。

DECLARE @MaxSeq INT = (SELECT MAX(Sequence) FROM #Credits)
DECLARE @Seq    INT = 1
WHILE @Seq <= @MaxSeq
  BEGIN
  -- Do something with this set of credits
  SELECT
    Client_ID, Date, Amount
  FROM
    #CreditsInSequence
  WHERE
    Sequence = @Seq

  SET @Seq += 1  -- Don't forget to increment the loop!
  END

与光标一样,这使您可以按顺序操作,完全处理每个客户端的第一个功劳,然后再转到第二个。作为奖励,根据我的经验,这种“假装FOR循环”通常比光标更快。

要确定应用每个信用额度的正确余额,我会从以下内容开始:

SELECT
  B.client_id,
  MIN(B.date) AS Date,
  B.amount - COALESCE(AC.Amount, 0.00) AS MaxAmountCreditable
FROM
  #balances AS B
  LEFT JOIN #AllocatedCredits AS AC ON B.BalanceID = AC.BalanceID
WHERE
  B.amount + COALESCE(AC.Amount, 0.00) > 0.00
GROUP BY
  B.client_id

你需要扩展这最后一个查询以从该日期获得实际余额ID(cvtGuid,如果我正在读你的表),在#AllocatedCredits中记录这些分配,处理信用额度足以支付的情况关闭多重余额等。

祝你好运,如果你需要任何帮助,请不要犹豫回来!

答案 2 :(得分:2)

首先,正如您所述,您应该只与那些有余额的客户打交道 其次,您可以使用WHILE循环模拟游标功能..

以下是对代码的修改。我留下了计算的内容,因为它们不是问题......如果你想让我完成代码,请告诉我

--first, only deal with those clients with balances
select CLIENT_ID into #ToDoList 
from [IDAT_AR_BALANCES]
group by CLIENT_ID
having sum(amount)!=0

--next, get the temp debit and credit tables just for the clients you are working on
select * into #balances from [IDAT_AR_BALANCES] where amount > 0 and CLIENT_ID IN (SELECT CLIENT_ID FROM #ToDoList)
select * into #credits from [IDAT_AR_BALANCES] where amount < 0 and CLIENT_ID IN (SELECT CLIENT_ID FROM #ToDoList)

--fine
create index ba_ID on #balances (CLIENT_ID)
create index cr_ID on #credits (CLIENT_ID)

--simulate a cursor... but much less resource intensive

declare @client_id varchar(11)

-- now loop through each client and perform their aging
while exists (select * from #ToDoList)
begin
    select top 1 @client_id = CLIENT_ID from #ToDoList 

    --perform your debit to credit matching and account aging here, per client

    delete from #TodoList where Client_ID=@client_ID
end

--clean up.. drop temp tables, etc

答案 3 :(得分:2)

你必须验证它是否会更快但是这是通过(大多数)基于集合的操作而不是基于游标来完成的。

测试数据

DECLARE @IDAT_AR_BALANCES TABLE (
  client_id INTEGER
  , amount FLOAT
  , date DATE
) 

INSERT INTO @IDAT_AR_BALANCES
  SELECT 1, 20.00, '1/1/2011'
  UNION ALL SELECT 1, 30.00, '1/2/2011'
  UNION ALL SELECT 1, -10.00, '1/3/2011'
  UNION ALL SELECT 1, 5.00, '1/4/2011'
  UNION ALL SELECT 2, 20.00, '1/1/2011'
  UNION ALL SELECT 2, 15.00, '1/2/2011'
  UNION ALL SELECT 2, -40.00, '1/3/2011'
  UNION ALL SELECT 2, 5.00, '1/4/2011'
  UNION ALL SELECT 3, 10.00, '1/1/2011'
  UNION ALL SELECT 3, -20.00, '1/2/2011'
  UNION ALL SELECT 3, 5.00, '1/3/2011' 

删除所有最多为0的数据(90%的数据)

  DELETE FROM @IDAT_AR_BALANCES
  FROM @IDAT_AR_BALANCES b
       INNER JOIN (
         SELECT client_id
         FROM   @IDAT_AR_BALANCES
         GROUP BY 
                client_id
         HAVING SUM(amount) = 0
       ) bd ON bd.client_id = b.client_id

剩余记录

DECLARE @Oldest TABLE (
  client_id INTEGER PRIMARY KEY CLUSTERED
  , date DATE
)

DECLARE @Negative TABLE (
  client_id INTEGER PRIMARY KEY CLUSTERED
  , amount FLOAT
)  

WHILE EXISTS (  SELECT  b.client_id
                        , MIN(b.amount) 
                FROM    @IDAT_AR_BALANCES b
                        INNER JOIN (
                          SELECT  client_id
                          FROM    @IDAT_AR_BALANCES
                          GROUP BY
                                  client_id
                          HAVING  COUNT(*) > 1
                        ) r ON r.client_id = b.client_id                
                WHERE   b.amount < 0 
                GROUP BY 
                        b.client_id 
                HAVING COUNT(*) > 0
             )
BEGIN

  DELETE FROM @Oldest
  DELETE FROM @Negative

  INSERT INTO @Oldest
    SELECT  client_id
            , date = MIN(date)
    FROM    @IDAT_AR_BALANCES 
    WHERE   amount > 0
    GROUP BY
            client_id

  INSERT INTO @Negative
    SELECT  b.client_id
            , amount = SUM(amount)
    FROM    @IDAT_AR_BALANCES b
            LEFT OUTER JOIN @Oldest o ON o.client_id = b.client_id AND o.date = b.date
    WHERE   amount < 0
            AND o.client_id IS NULL
    GROUP BY
            b.client_id

  UPDATE  @IDAT_AR_BALANCES
  SET     b.amount = b.amount + n.amount
  FROM    @IDAT_AR_BALANCES b
          INNER JOIN @Oldest o ON o.client_id = b.client_id AND o.date = b.date
          INNER JOIN @Negative n ON n.client_id = b.client_id

  DELETE FROM @IDAT_AR_BALANCES
  FROM    @IDAT_AR_BALANCES b
          LEFT OUTER JOIN @Oldest o ON o.client_id = b.client_id AND o.date = b.date
          INNER JOIN (
            SELECT  client_id
            FROM    @IDAT_AR_BALANCES
            GROUP BY
                    client_id
            HAVING  COUNT(*) > 1
          ) r ON r.client_id = b.client_id
  WHERE   amount < 0
          AND o.client_id IS NULL

END  

DELETE  FROM @IDAT_AR_BALANCES
WHERE   amount = 0          

SELECT  *
FROM    @IDAT_AR_BALANCES

答案 4 :(得分:1)

最后一个想法......我确实为几年前开发的大型害虫控制CRM编写了这个代码...我发现这个问题最有效的解决方案是......一个.NET CLR存储过程

虽然我经常不惜一切代价避免使用CLR Proc ..但有时它们的性能优于SQL。在这种情况下,在CLR过程中,使用数学计算的过程(逐行)查询可以快得多。

就我而言,它明显快于SQL。

FYI