Question

MyTableA拥有数百万条记录。在常规情况下，MyTableA中的每一行都需要使用来自ItsTableA的值进行更新。

不幸的是我没有对ItsTableA的控制，并且没有字段来指示他们是否有任何更改，所以我只是更新所有内容或我更新基于比较每个可能不同的字段（不太可行，因为这是一个很长的和宽表）。

不幸的是，事务日志在进行直接更新时不断膨胀，所以我想通过使用UPDATE TOP对其进行分块，但是，据我所知，我需要一些字段来确定MyTableA中的记录是否已更新，否则我'最终会陷入无限循环：

declare @again as bit;
set @again = 1;

while @again = 1
  begin
    update top (10000) MyTableA
    set my.A1 = their.A1, my.A2 = their.A2, my.A3 = their.A3
    from MyTableA my
    join TheirTableA their on my.Id = their.Id

    if @@ROWCOUNT > 0
      set @again = 1
    else
      set @again = 0
end

如果我添加

，

是唯一可行的方法

where my.A1 <> their.A1 and my.A2 <> their.A2 and my.A3 <> their.A3

这似乎是非常低效的，需要比较许多列

我确定我错过了一个明显的选择？

Answer 1

假设两个表都是相同的结构，您可以使用

获取不同行的结果集

SELECT * into #different_rows from MyTable EXCEPT select * from TheirTable然后使用任何可用的关键字段从中进行更新。

Answer 2

嗯，第一个也是最简单的解决方案显然是，如果您可以更改架构以包含上次更新的时间戳 - 然后仅使用比上次更改更新的时间戳更新行。

但是如果这是不可能的，另一种方法可能是使用HashBytes函数，也许通过将字段连接成一个然后比较的xml。这里需要注意的是8kb限制（https://connect.microsoft.com/SQLServer/feedback/details/273429/hashbytes-function-should-support-large-data-types）编辑：我再次窃取了代码，这次来自：

http://sqlblogcasts.com/blogs/tonyrogerson/archive/2009/10/21/detecting-changed-rows-in-a-trigger-using-hashbytes-and-without-eventdata-and-or-s.aspx

他的榜样是：

select batch_id
from (
    select distinct batch_id, hash_combined = hashbytes( 'sha1', combined )
    from (  select batch_id,
                   combined =(  select batch_id, batch_name, some_parm, some_parm2
                                from deleted c       --  need old values
                                where c.batch_id = d.batch_id
                                for xml path( '' ) )
            from deleted d
            union all
            select batch_id,
                   combined =(  select batch_id, batch_name, some_parm, some_parm2
                                from some_base_table c       --  need current values (could use inserted here)
                                where c.batch_id = d.batch_id
                                for xml path( '' ) )
            from deleted d
        ) as r
    ) as c
group by batch_id
having count(*) > 1

最后的手段（以及我原来的建议）是尝试Binary_Checksum？正如评论中所指出的，这确实存在相当高的碰撞率的风险。

http://msdn.microsoft.com/en-us/library/ms173784.aspx

我从lessthandot.com窃取了以下示例 - 下面是完整SQL（以及其他很酷的功能）的链接。

--Data Mismatch
SELECT 'Data Mismatch', t1.au_id
FROM( SELECT BINARY_CHECKSUM(*) AS CheckSum1 ,au_id FROM pubs..authors) t1
JOIN(SELECT BINARY_CHECKSUM(*) AS CheckSum2,au_id FROM tempdb..authors2) t2 ON t1.au_id =t2.au_id
WHERE CheckSum1 <> CheckSum2

取自http://wiki.lessthandot.com/index.php/Ten_SQL_Server_Functions_That_You_Have_Ignored_Until_Now

的示例

Answer 3

我不知道这是否比添加where my.A1 <> their.A1 and my.A2 <> their.A2 and my.A3 <> their.A3更好，但我肯定会尝试一下（假设SQL Server 2005 +）：

declare @again as bit;
set @again = 1;

declare @idlist table (Id int);

while @again = 1
  begin
    update top (10000) MyTableA
    set my.A1 = their.A1, my.A2 = their.A2, my.A3 = their.A3
    output inserted.Id into @idlist (Id)
    from MyTableA my
    join TheirTableA their on my.Id = their.Id
    left join @idlist i on my.Id = i.Id
    where i.Id is null
    /* alternatively (instead of left join + where):
    where not exists (select * from @idlist where Id = my.Id) */

    if @@ROWCOUNT > 0
      set @again = 1
    else
      set @again = 0
end

即，声明一个表变量，用于收集正在更新的行的ID，并使用该表查找（并省略）已更新的ID。

该方法的一个细微变化是使用本地临时表而不是表变量。这样，您就可以在ID查找表上创建索引，这可能会带来更好的性能。

Answer 4

如果无法进行架构更改。如何使用触发器来保存已更改的ID。并且只导入/导出那些行。

或使用触发器立即导出。

tsql批量更新

4 个答案: