我有一张表(比如tableB
),大约有40M行(总是在增加)。归档目前通过DELETE .. OUTPUT .. INTO .. FROM ..
方式完成。归档1000行需要3~5秒
原来。但随着更多行被删除,它需要更多时间。例如,删除10M行后,现在需要35~40秒
删除1000行。
造成这种情况的原因是什么? 如何改善这种情况(我需要存档至少30M行)? 如果分区是唯一的方法,我怎么能用最少的停机时间做到这一点?
其他信息:
tableB
有2个外键列(tableAId
,tableCId
)tableA
中的日期时间字段执行存档(inner-join
声明中正在使用DELETE
)with (index=ix_time)
使用查询提示tableA
,则该计划显示对tableB
tableA
和tableB
都有自动递增bigint
作为主键。tableB
有4个索引tableA
有5个指数tableA
有30M +行。查询计划摘录:
剧本:
DECLARE @older_than datetime2(0) = '2015-10-01';
DECLARE @i int = 1;
DECLARE @j int = 0;
DECLARE @imax int = 1000;
DECLARE @jmax int = 50;
DECLARE @total int = 0;
DECLARE @t1 DATETIME2(3);
DECLARE @t2 DATETIME2(3);
DECLARE @timetook int;
WHILE @i > 0 AND @j < @jmax
BEGIN
SET @t1 = GETDATE();
DELETE TOP (@imax) ss
OUTPUT deleted.[Id]
,deleted.[columnA]
,deleted.[columnB]
INTO [MyArchive_Data].dbo.tableB([Id]
,columnA
,columnB)
FROM [MyLive_Data].dbo.tableB ss
INNER JOIN [MyLive_Data].dbo.tableA s ON s.Id = ss.tableAID
WHERE s.Time < @older_than;
SET @i = @@rowcount;
SET @j = @j + 1;
SET @total = @total + @i;
SET @t2 = GETDATE();
SET @timetook = datediff(second,@t1,@t2);
RAISERROR('LOOP %d COMPLETE [%d rows][%d sec]',10,1,@j, @total, @timetook) with nowait;
WAITFOR DELAY '00:00:03';
END
更新
似乎如果我排除循环结构(WHILE @i > 0 AND @j < @jmax
)并单独运行DELETE语句,则需要10~12秒。我观察了查询计划。那些是不同的。在循环结构为ON的情况下,它使用index-seek
但没有它,index-scan
被使用。为什么呢?
答案 0 :(得分:0)
如果没有看到带有索引的完整表模式,我会说SQL服务器查询优化器发现扫描聚簇索引更有效,因为表上的统计信息显示了ID和Time值之间的相关性,以便知道如果它从聚簇索引的末尾开始并以相反的顺序处理它,则它必须读取较少的行以找到满足查询的TOP(x)行。 Brent Ozar's post on Scans, Seeks and Statistics
此外,循环中有WAITFOR
语句可能导致表上的锁升级,因为删除是在隐式事务中完成的,因此只有在循环结束后才会提交删除。尝试在删除语句之前添加BEGIN TRANSACTION
,在其后添加COMMIT TRANSACTION
。如果所有可能的话都删除WAITFOR
语句,因为它会导致处理延迟。
DECLARE @older_than datetime2(0) = '2015-10-01';
DECLARE @i int = 1;
DECLARE @j int = 0;
DECLARE @imax int = 1000;
DECLARE @jmax int = 50;
DECLARE @total int = 0;
DECLARE @t1 DATETIME2(3);
DECLARE @t2 DATETIME2(3);
DECLARE @timetook int;
WHILE @i > 0 AND @j < @jmax
BEGIN
SET @t1 = GETDATE();
BEGIN TRANSACTION
DELETE TOP (@imax) ss
OUTPUT deleted.[Id]
,deleted.[columnA]
,deleted.[columnB]
INTO [MyArchive_Data].dbo.tableB([Id]
,columnA
,columnB)
FROM [MyLive_Data].dbo.tableB ss
INNER JOIN [MyLive_Data].dbo.tableA s ON s.Id = ss.tableAID
WHERE s.Time < @older_than;
COMMIT TRANSACTION
SET @i = @@rowcount;
SET @j = @j + 1;
SET @total = @total + @i;
SET @t2 = GETDATE();
SET @timetook = datediff(second,@t1,@t2);
RAISERROR('LOOP %d COMPLETE [%d rows][%d sec]',10,1,@j, @total, @timetook) with nowait;
WAITFOR DELAY '00:00:03';
END
答案 1 :(得分:0)
以下是我希望对非常大的存档处理更有效的示例。此方法按ID范围而不是TOP批量删除。您可以根据性能和并发需求调整批处理大小。
DECLARE
@older_than datetime2(0) = '2015-10-01'
, @i int = 1
, @j int = 0
, @total int = 0
, @t1 DATETIME2(3)
, @t2 DATETIME2(3)
, @timetook int
, @MinID int
, @MaxId int
, @BatchFirstId int
, @BatchLastId int
, @BatchSize int =100000;
SELECT @MinID = MIN(Id), @MaxID = MAX(Id) FROM dbo.TableA;
SET @BatchFirstId = @MinID;
WHILE @BatchFirstId <= @MaxId
BEGIN
SET @BatchLastID = @BatchFirstId + @BatchSize - 1;
SET @t1 = GETDATE();
DELETE ss
OUTPUT deleted.[Id]
,deleted.[columnA]
,deleted.[columnB]
INTO [MyArchive_Data].dbo.tableB([Id]
,columnA
,columnB)
FROM [MyLive_Data].dbo.tableB ss
INNER JOIN [MyLive_Data].dbo.tableA s ON s.Id = ss.tableAID
WHERE s.Time < @older_than
AND s.Id BETWEEN @BatchFirstID AND @BatchLastID;
SET @i = @@ROWCOUNT;
SET @BatchFirstId += @BatchSize;
SET @j = @j + 1;
SET @total = @total + @i;
SET @t2 = GETDATE();
SET @timetook = datediff(second,@t1,@t2);
RAISERROR('LOOP %d COMPLETE [%d rows][%d sec]',10,1,@j, @total, @timetook) with nowait;
WAITFOR DELAY '00:00:03';
END;