我试图基于3个不同列的组合键将数据从源登台表合并到目标表中。每个表还具有我要考虑的最后更新日期,这意味着我只应合并源行比目标行新鲜的数据。
我的复合键中的一列是一种“替代键”(让我知道是否有一个更好的词),这意味着我们要删除目标中与替代键匹配的任何行,但不能删除在组合键的其他部分。我可以通过使用CTE作为目标并具有WHEN NOT MATCHED BY Target删除语句而无需检查日期的情况下完成此操作,但是我需要检查该覆盖键在目标中的最小最后更新日期是否小于最小最后更新日期。该替代键在来源中的更新日期。
示例源表,其中Field1为替代键,Field1,Field2和Field3为复合键:
+--------+--------+--------+--------+-------------------------+
| Field1 | Field2 | Field3 | Field4 | LastUpdatedDate |
+--------+--------+--------+--------+-------------------------+
| 1 | 2 | 3 | 8 | 2000-12-31 00:00:00.000 |
+--------+--------+--------+--------+-------------------------+
| 1 | 2 | 4 | 8 | 2000-12-31 00:00:00.000 | --No 1, 2, 5 row
+--------+--------+--------+--------+-------------------------+
| 1 | 2 | 6 | 8 | 2000-12-31 00:00:00.000 | --Skips right to 1, 2, 6
+--------+--------+--------+--------+-------------------------+
| 2 | 1 | 1 | 8 | 2000-12-31 00:00:00.000 |
+--------+--------+--------+--------+-------------------------+
| 2 | 1 | 2 | 8 | 2000-12-31 00:00:00.000 |
+--------+--------+--------+--------+-------------------------+
| 3 | 1 | 1 | 1 | 2000-12-31 00:00:00.000 |
+--------+--------+--------+--------+-------------------------+
| 4 | 1 | 1 | 2 | 2000-12-31 00:00:00.000 |
+--------+--------+--------+--------+-------------------------+
目标表示例,其中Field1为替代键,而Field1,Field2和Field3为复合键:
+--------+--------+--------+--------+-------------------------+
| Field1 | Field2 | Field3 | Field4 | LastUpdatedDate |
+--------+--------+--------+--------+-------------------------+
| 1 | 2 | 3 | 7 | 2010-06-15 00:00:00.000 |
+--------+--------+--------+--------+-------------------------+
| 1 | 2 | 4 | 7 | 2010-06-15 00:00:00.000 |
+--------+--------+--------+--------+-------------------------+
| 1 | 2 | 5 | 7 | 2010-06-15 00:00:00.000 | --There is a 1, 2, 5 row
+--------+--------+--------+--------+-------------------------+
| 1 | 2 | 6 | 7 | 2010-06-15 00:00:00.000 |
+--------+--------+--------+--------+-------------------------+
| 2 | 1 | 1 | 7 | 2010-06-15 00:00:00.000 |
+--------+--------+--------+--------+-------------------------+
| 2 | 1 | 2 | 7 | 2010-06-15 00:00:00.000 |
+--------+--------+--------+--------+-------------------------+
所需结果表。请注意,尽管将Field1,Field2和Field3分别为1、2和5的行不在源表中,但它仍在输出中,因为源中Field1 = 1的最小最后更新日期小于最小日期。目标中Field1 = 1的最后更新日期:
+--------+--------+--------+--------+--------------------+
| Field1 | Field2 | Field3 | Field4 | LastUpdatedDate |
+--------+--------+--------+--------+--------------------+
| 1 | 2 | 3 | 7 | 2010-06-15 0:00:00 |
+--------+--------+--------+--------+--------------------+
| 1 | 2 | 4 | 7 | 2010-06-15 0:00:00 |
+--------+--------+--------+--------+--------------------+
| 1 | 2 | 5 | 7 | 2010-06-15 0:00:00 | --Still here
+--------+--------+--------+--------+--------------------+
| 1 | 2 | 6 | 7 | 2010-06-15 0:00:00 |
+--------+--------+--------+--------+--------------------+
| 2 | 1 | 1 | 7 | 2010-06-15 0:00:00 |
+--------+--------+--------+--------+--------------------+
| 2 | 1 | 2 | 7 | 2010-06-15 0:00:00 |
+--------+--------+--------+--------+--------------------+
| 3 | 1 | 1 | 1 | 2000-12-31 0:00:00 |
+--------+--------+--------+--------+--------------------+
| 4 | 1 | 1 | 2 | 2000-12-31 0:00:00 |
+--------+--------+--------+--------+--------------------+
当前的合并语句,其中的delete子句注释了一点不正确的语法,使您了解我要执行的操作:
WITH TargetQuery AS ( SELECT *
FROM TargetTable
WHERE EXISTS ( SELECT 1
FROM SourceTable
WHERE SourceTable.Field1 = TargetTable.Field1 ) )
MERGE TargetQuery AS Target
USING SourceTable AS Source
ON ( Target.Field1 = Source.Field1
AND Target.Field2 = Source.Field2
AND Target.Field3 = Source.Field3 )
WHEN NOT MATCHED BY Target THEN
INSERT ( Field1,
Field2,
Field3,
Field4,
LastUpdatedDate )
VALUES ( Field1,
Field2,
Field3,
Field4,
Source.LastUpdatedDate )
WHEN MATCHED AND Source.LastUpdatedDate >= Target.LastUpdatedDate THEN
UPDATE SET Target.Field4 = Source.Field4
WHEN NOT MATCHED BY Source --AND MIN( Source.LastUpdatedDate) OVER( PARTITION BY Source.Field1 )
-- >= MIN( Target.LastUpdatedDate ) OVER( PARTITION BY Target.Field1 )
THEN DELETE;
看来我可以使用此复杂的源查询来完成工作,但在较大的数据集上性能却很糟糕:
WITH SourceQuery AS ( SELECT COALESCE( SourceTable.Field1, TargetTable.Field1 ) Field1,
COALESCE( SourceTable.Field2, TargetTable.Field2 ) Field2,
COALESCE( SourceTable.Field3, TargetTable.Field3 ) Field3,
COALESCE( SourceTable.Field4, TargetTable.Field4 ) Field4,
IsDeleted = CASE WHEN SourceTable.Field1 IS NULL THEN 1
ELSE 0 END,
MinSourceLastUpdateTime,
MinTargetLastUpdateTime = MIN( TargetTable.LastUpdatedDate ) OVER( PARTITION BY TargetTable.Field1 ),
SourceLastUpdateTime = SourceTable.LastUpdatedDate,
TargetLastUpdateTime = TargetTable.LastUpdatedDate
FROM dbo.TargetTable
FULL OUTER JOIN dbo.SourceTable
ON SourceTable.Field1 = TargetTable.Field1
AND SourceTable.Field2 = TargetTable.Field2
AND SourceTable.Field3 = TargetTable.Field3
LEFT OUTER JOIN ( SELECT Field1,
MIN( LastUpdatedDate ) MinSourceLastUpdateTime
FROM dbo.SourceTable
GROUP BY Field1 ) GroupedSource
ON TargetTable.Field1 = GroupedSource.Field1 )
MERGE dbo.TargetTable AS Target
USING SourceQuery AS Source
ON ( Source.Field1 = Target.Field1
AND Source.Field2 = Target.Field2
AND Source.Field3 = Target.Field3 )
WHEN MATCHED AND Source.IsDeleted = 0 AND Source.SourceLastUpdateTime >= Source.TargetLastUpdateTime
THEN UPDATE SET Target.Field4 = Source.Field4
WHEN NOT MATCHED BY Target
THEN INSERT ( Field1,
Field2,
Field3,
Field4 )
Values ( Source.Field1,
Source.Field2,
Source.Field3,
Source.Field4 )
WHEN MATCHED AND Source.IsDeleted = 1 AND Source.MinSourceLastUpdateTime >= Source.MinTargetLastUpdateTime
THEN DELETE;