大表的合并填充事务日志。 我们的ETL使用MERGE从医院信息系统导入数据。当记录数超过数百万时,事务日志就会填满。如何重构语句,以便最大限度地减少对事务日志的影响?最佳()? OPENROWSET?
我们的查询是使用变量在SSIS中动态构建的。这是它的样子: --Man查询将Mergequery1连接到5.所有如下所示。 - 主要查询
begin transaction;
begin try
declare @MergeQuery varchar(max)
set @MergeQuery = convert(varchar(max), @MergeQuery1) + convert(varchar(max), @MergeQuery2)
+ ' ' + convert(varchar(max), @MergeQuery3)
+ ' ' + convert(varchar(max), @MergeQuery4)
+ ' ' + convert(varchar(max), @MergeQuery5);
exec(@MergeQuery);
end try
begin catch
declare
@Message VARCHAR(4000)
,@Severity INT
,@State INT;
select
@Message = ERROR_MESSAGE()
,@Severity = ERROR_SEVERITY()
,@State = ERROR_STATE();
if @@TRANCOUNT > 0
rollback transaction;
raiserror(@Message, @Severity, @State);
end catch;
if @@trancount > 0
begin
commit transaction;
end
--Mergequery1 section modified
"begin transaction
begin try
begin
declare @ChangeSum table(change varchar(25));
declare @Inserted int = 0;
declare @Updated int = 0;
merge top (1000000) into "+ @[User::VISNDestinationDatabase] +"." + @[User::DataDestinationTable] + " prod
using " + @[User::VISNStagingDestinationTable] + " stag
on " + @[User::MergeOnResult] +
" when MATCHED then
update set " +@[User::MergeUpdTextResult]
--Mergequery2
@[User::MergeUpdTextResult2] + " when not matched by target then"
--Mergequery3
" insert
( " + @[User::MergeInsertResult] +
"
)"`enter code here`
--Mergequery4
"values
( "
+ @[User::MergeValueResult]+
" )"
--Mergequery5
"output $action into @ChangeSum;
set @Inserted = (select count(*) from @ChangeSum where change = 'INSERT');
set @Updated = (select count(*) from @ChangeSum where change = 'UPDATE');
select @Inserted as Inserted, @Updated as Updated;
end
end try
begin catch
rollback
end catch
if @@trancount > 0
commit transaction"
答案 0 :(得分:1)
假设大部分事务日志都是由MERGE的UPDATE部分生成的(你真的插入了数百万个新行,还是主要是更新?),可能有一种简单的方法可以解决这个问题。
在WHEN MATCHED部分添加AND条件以确保更新的列在源中实际已更改。例如:
...
WHEN MATCHED AND (stag.ColumnA <> prod.ColumnA OR stag.ColumnB <> prod.ColumnB)
THEN UPDATE SET prod.ColumnA = stag.ColumnA, prod.ColumnB = stag.ColumnB
...
通常,您需要找到一种批量分解登台表的方法。在这种情况下,执行此操作的最佳方法是通过登台表上的群集键。例如,假设登台表由整数ID列聚类,您可以选择例如按ID排序的前100000行然后将此批次合并到prod中,同时记下读取的最后一个ID。下一批将从MaxID + 1等开始。要确定最后一个读取ID,请在@ChangeSum表变量中添加一个新ID列,然后按如下所示修改OUTPUT子句:
output stage.ID, $action into @ChangeSum;
然后获取MaxID:
SELECT @MaxID = MAX(ID) FROM @ChangeSum;