Question

我希望通过从年初开始为所有值设置FLAG列为1来更新SQL Server中的表：

TABLE
DATE        ID     FLAG   (more columns...)
2016/01/01  1      0      ...
2016/01/01  2      0      ...
2016/01/02  3      0      ...
2016/01/02  4      0      ...
(etc)

问题是这个表包含数以亿计的记录，并且我被建议一次分块更新100,000行，以避免阻塞其他进程。

我需要记住我更新了哪些行，因为有一些后台进程会在处理完FLAG后立即将其翻转为0。

有人有关于我如何做到这一点的建议吗？每天的数据有超过一百万条记录，所以我不能简单地使用DATE循环作为计数器。我正在考虑使用ID

Answer 1

假设date列和ID列是顺序的，你可以做一个简单的循环。我的意思是，如果有记录id=1 and date=2016-1-1，那么记录id=2 date=2015-12-31就不存在了。如果您担心锁定/异常，则应在WHILE块中添加事务，并在失败时提交或回滚。

将@batchSize更改为经过一些实验后您认为正确的任何内容。

DECLARE @currentId int, @maxId int, @batchSize int = 10000

SELECT @currentId = MIN(ID), @maxId = MAX(ID) FROM YOURTABLE WHERE DATE >= '2016-01-01'

WHILE @currentId < @maxId
BEGIN
    UPDATE YOURTABLE SET FLAG = 1 WHERE ID BETWEEN @currentId AND (@currentId + @batchSize)
    SET @currentId = @currentId + @batchSize
END

因为更新永远不会将同一记录标记为1两次，所以除非您要在中途手动停止该过程，否则我不需要跟踪触摸的记录。

您还应确保ID列上有索引，以便在每个更新语句中快速检索。

Answer 2

看起来很简单的问题，或者我可能遗漏了一些东西。

您可以创建临时/永久表来跟踪更新的行。

create tbl (Id int) -- or temp table based on your case
insert into tbl values (0)

declare @lastId int = (select Id from tbl)

;with cte as (
    select top 100000 
    from YourMainTable
    where Id > @lastId
    ORDER BY Id
)
update cte 
set Flag = 1

update tbl set Id = @lastId + 100000

您可以循环执行此过程（表创建部分除外）

Answer 3

create table #tmp_table
(

    id int ,
    row_number int
)

insert into #tmp_table
(

    id,
    row_number
)

--logic to load records from base table
select
    bt.id,
    row_number() over(partition by id order by id ) as row_number
from
    dbo.bas_table bt
where
    --ur logic to limit the records

declare @batch_size int = 100000;
declare @start_row_number int,@end_row_number int;
select
    @start_row_number = min(row_number),
    @end_row_number = max(row_number)
from
    #tmp_table

while(@start_row_number < @end_row_number)
begin
    update top @batch_size
        bt
    set
        bt.flag = 1
    from
        dbo.base_table bt
        inner join #tmp_table tt on
            tt.Id = bt.Id
    where
        bt.row_number between @start_row_number and (@start_row_number + @batch_size)
    set @start_row_number = @start_row_number + @batch_size
end

如何更新SQL Server的更新？

3 个答案: