Question

我正在更新包含数百万（8500万）行的表中的2列。现在更新这些我正在使用像

这样的更新命令

UPDATE Table1

  SET Table1.column1 = Table2.column1 ,
      Table1.column2 = Table2.column2 

FROM 
      Tables and with a Join-conditions;

现在我的问题是，这需要23个小时。即使使用批量大小，所用时间也没有太大变化。

但我需要在不到5个小时内更新它。那可能吗。我应该采取什么方法来实现它？

Answer 1

SQL Update语句必须保留日志文件中的所有行，以便在失败时回滚。正如this guy所解释的，处理数百万行的最佳方法是忘记原子性并将更新批处理为50,000行（或其他）：

--Declare variable for row count
Declare @rc int
Set @rc=50000

While @rc=50000
 Begin

  Begin Transaction

  --Use Top (50000) to limit number of updates
  --performed in each batch to 50K rows.
  --Use tablockx and holdlock to obtain and hold 
  --an immediate exclusive table lock. This unusually
  --speeds the update because only one lock is needed.
  Update Top (50000) MyTable With (tablockx, holdlock)
    Set UpdFlag = 0
  From MyTable mt
  Join ControlTable ct
    On mt.KeyCol=ct.PK
  --Add criteria to avoid updating rows that
  --were updated in previous pass
  Where m.UpdFlag <> 0

  --Get number of rows updated
  --Process will continue until less than 50000
  Select @rc=@@rowcount

  --Commit the transaction
  Commit
 End

这还有一些问题，你需要知道你已经处理过哪些行，也许比这个人更聪明的人（和我！）可以用更多的MSSQL魔法来表现更好的东西;但这应该是一个开始。

Answer 2

我已经使用SSIS来完成这项任务。

首先，我采用了源表，我必须更新2列。然后我进行了查找任务，我必须将源列映射到目标表列，我必须从中获取数据以更新源表列。最后添加了OLEDB目的地，我将根据查找的连接条件填写表格。

此过程非常快于执行更新脚本。

更新包含数百万行的表中的两列

2 个答案: