Question

我使用SQL Server 2014，需要在一个表中更新新添加的日期时间类型列。有两个相关的表（两个都有> 3000万条记录）：

表A：

CategoryID, itemID, dataCreated, deleted, some other string properties.

此表包含具有不同datecreated的每个项目的多个记录。

表B：

CategoryID, itemID, LatestUpdatedDate (This is the new added column)

categoryID和itemID都是此表索引的一部分。

要在匹配的LatestUpdatedDate和CategoryID上更新来自表A的tableB＆＃39; ItemID，我使用了以下合并声明：

merge [dbo].[TableB] with(HOLDLOCK) as t
using 
(
    select CategoryID,itemID, max(DateCreated) as LatestUpdatedDate 
    from dbo.TableA 
    where TableA.Deleted = 0
    group by CategoryID,itemID
) as s on t.CategoryID = s.CategoryID and t.itemID = s.itemID

when matched then
    update
    set t.LatestUpdatedDate = s.LatestUpdatedDate

when not matched then
    insert (CategoryID, itemID, LatestUpdatedDate)
    values (s.CategoryID, s.itemID)

鉴于两个表中有数百万条记录，我该如何优化此脚本？或者还有其他方法可以更好地更新表吗？

注意：这是一个一次性脚本，数据库处于活动状态，将会有一个触发器添加到tableA，而不是将来更新tableB中的日期。

Answer 1

根据Optimizing MERGE Statement Performance，您可以做的最好的事情是：

在源表中的连接列上创建一个唯一且覆盖的索引。
在目标表的连接列上创建唯一的聚簇索引。

通过在MERGE1 TableA上创建索引，您可以在(Deleted, CategoryID, itemID) INCLUDE(DateCreated)期间获得性能提升。但是，由于这是一次性操作，创建此索引所需的资源（时间，CPU，空间）可能不会抵消按原样运行查询并依赖现有索引的性能提升

如何优化运行数百万条记录的SQL Server Merge语句

1 个答案: