有效地合并组合键,同时根据覆盖键和日期推断删除

时间:2018-09-12 19:33:54

标签: sql tsql merge upsert composite-key

我试图基于3个不同列的组合键将数据从源登台表合并到目标表中。每个表还具有我要考虑的最后更新日期,这意味着我只应合并源行比目标行新鲜的数据。

我的复合键中的一列是一种“替代键”(让我知道是否有一个更好的词),这意味着我们要删除目标中与替代键匹配的任何行,但不能删除在组合键的其他部分。我可以通过使用CTE作为目标并具有WHEN NOT MATCHED BY Target删除语句而无需检查日期的情况下完成此操作,但是我需要检查该覆盖键在目标中的最小最后更新日期是否小于最小最后更新日期。该替代键在来源中的更新日期。

示例源表,其中Field1为替代键,Field1,Field2和Field3为复合键:

+--------+--------+--------+--------+-------------------------+
| Field1 | Field2 | Field3 | Field4 | LastUpdatedDate         |  
+--------+--------+--------+--------+-------------------------+  
| 1      | 2      | 3      | 8      | 2000-12-31 00:00:00.000 |  
+--------+--------+--------+--------+-------------------------+  
| 1      | 2      | 4      | 8      | 2000-12-31 00:00:00.000 |  --No 1, 2, 5 row
+--------+--------+--------+--------+-------------------------+  
| 1      | 2      | 6      | 8      | 2000-12-31 00:00:00.000 |  --Skips right to 1, 2, 6
+--------+--------+--------+--------+-------------------------+  
| 2      | 1      | 1      | 8      | 2000-12-31 00:00:00.000 |  
+--------+--------+--------+--------+-------------------------+  
| 2      | 1      | 2      | 8      | 2000-12-31 00:00:00.000 |  
+--------+--------+--------+--------+-------------------------+  
| 3      | 1      | 1      | 1      | 2000-12-31 00:00:00.000 |  
+--------+--------+--------+--------+-------------------------+  
| 4      | 1      | 1      | 2      | 2000-12-31 00:00:00.000 |  
+--------+--------+--------+--------+-------------------------+  

目标表示例,其中Field1为替代键,而Field1,Field2和Field3为复合键:

+--------+--------+--------+--------+-------------------------+  
| Field1 | Field2 | Field3 | Field4 | LastUpdatedDate         |  
+--------+--------+--------+--------+-------------------------+  
| 1      | 2      | 3      | 7      | 2010-06-15 00:00:00.000 |  
+--------+--------+--------+--------+-------------------------+  
| 1      | 2      | 4      | 7      | 2010-06-15 00:00:00.000 |  
+--------+--------+--------+--------+-------------------------+  
| 1      | 2      | 5      | 7      | 2010-06-15 00:00:00.000 |  --There is a 1, 2, 5 row
+--------+--------+--------+--------+-------------------------+  
| 1      | 2      | 6      | 7      | 2010-06-15 00:00:00.000 |  
+--------+--------+--------+--------+-------------------------+  
| 2      | 1      | 1      | 7      | 2010-06-15 00:00:00.000 |  
+--------+--------+--------+--------+-------------------------+  
| 2      | 1      | 2      | 7      | 2010-06-15 00:00:00.000 |  
+--------+--------+--------+--------+-------------------------+  

所需结果表。请注意,尽管将Field1,Field2和Field3分别为1、2和5的行不在源表中,但它仍在输出中,因为源中Field1 = 1的最小最后更新日期小于最小日期。目标中Field1 = 1的最后更新日期:

+--------+--------+--------+--------+--------------------+  
| Field1 | Field2 | Field3 | Field4 | LastUpdatedDate    |  
+--------+--------+--------+--------+--------------------+  
| 1      | 2      | 3      | 7      | 2010-06-15 0:00:00 |  
+--------+--------+--------+--------+--------------------+  
| 1      | 2      | 4      | 7      | 2010-06-15 0:00:00 |  
+--------+--------+--------+--------+--------------------+  
| 1      | 2      | 5      | 7      | 2010-06-15 0:00:00 |  --Still here
+--------+--------+--------+--------+--------------------+  
| 1      | 2      | 6      | 7      | 2010-06-15 0:00:00 |  
+--------+--------+--------+--------+--------------------+  
| 2      | 1      | 1      | 7      | 2010-06-15 0:00:00 |  
+--------+--------+--------+--------+--------------------+  
| 2      | 1      | 2      | 7      | 2010-06-15 0:00:00 |  
+--------+--------+--------+--------+--------------------+  
| 3      | 1      | 1      | 1      | 2000-12-31 0:00:00 |  
+--------+--------+--------+--------+--------------------+  
| 4      | 1      | 1      | 2      | 2000-12-31 0:00:00 |  
+--------+--------+--------+--------+--------------------+  

当前的合并语句,其中的delete子句注释了一点不正确的语法,使您了解我要执行的操作:

WITH TargetQuery AS ( SELECT *
                        FROM TargetTable
                        WHERE EXISTS ( SELECT 1
                                         FROM SourceTable
                                         WHERE SourceTable.Field1 = TargetTable.Field1 ) )

MERGE TargetQuery AS Target
  USING SourceTable AS Source
    ON ( Target.Field1 = Source.Field1 
      AND Target.Field2 = Source.Field2
      AND Target.Field3 = Source.Field3 )
  WHEN NOT MATCHED BY Target THEN
    INSERT ( Field1,
             Field2,
             Field3,
             Field4,
             LastUpdatedDate )
      VALUES ( Field1,
               Field2,
               Field3,
               Field4,
               Source.LastUpdatedDate )
  WHEN MATCHED AND Source.LastUpdatedDate >= Target.LastUpdatedDate THEN
    UPDATE SET Target.Field4 = Source.Field4
  WHEN NOT MATCHED BY Source --AND  MIN( Source.LastUpdatedDate) OVER( PARTITION BY Source.Field1 ) 
                             --     >= MIN( Target.LastUpdatedDate ) OVER( PARTITION BY Target.Field1 )
    THEN DELETE;

看来我可以使用此复杂的源查询来完成工作,但在较大的数据集上性能却很糟糕:

WITH SourceQuery AS ( SELECT COALESCE( SourceTable.Field1, TargetTable.Field1 ) Field1,
                             COALESCE( SourceTable.Field2, TargetTable.Field2 ) Field2,
                             COALESCE( SourceTable.Field3, TargetTable.Field3 ) Field3,
                             COALESCE( SourceTable.Field4, TargetTable.Field4 ) Field4,
                             IsDeleted = CASE WHEN SourceTable.Field1 IS NULL THEN 1
                                              ELSE 0 END,
                             MinSourceLastUpdateTime,
                             MinTargetLastUpdateTime = MIN( TargetTable.LastUpdatedDate ) OVER( PARTITION BY TargetTable.Field1 ),
                             SourceLastUpdateTime = SourceTable.LastUpdatedDate,
                             TargetLastUpdateTime = TargetTable.LastUpdatedDate
                        FROM dbo.TargetTable
                          FULL OUTER JOIN dbo.SourceTable
                            ON SourceTable.Field1 = TargetTable.Field1
                              AND SourceTable.Field2 = TargetTable.Field2
                              AND SourceTable.Field3 = TargetTable.Field3
                          LEFT OUTER JOIN ( SELECT Field1,
                                                   MIN( LastUpdatedDate ) MinSourceLastUpdateTime
                                              FROM dbo.SourceTable
                                              GROUP BY Field1 ) GroupedSource
                            ON  TargetTable.Field1 = GroupedSource.Field1 )

MERGE dbo.TargetTable AS Target
  USING SourceQuery AS Source
    ON ( Source.Field1 = Target.Field1
          AND Source.Field2 = Target.Field2
          AND Source.Field3 = Target.Field3 )
  WHEN MATCHED AND Source.IsDeleted = 0 AND Source.SourceLastUpdateTime >=  Source.TargetLastUpdateTime 
    THEN UPDATE SET Target.Field4 = Source.Field4
  WHEN NOT MATCHED BY Target 
    THEN INSERT ( Field1,
                  Field2,
                  Field3,
                  Field4 )
     Values ( Source.Field1,
              Source.Field2,
              Source.Field3,
              Source.Field4 )
  WHEN MATCHED AND Source.IsDeleted = 1 AND Source.MinSourceLastUpdateTime >= Source.MinTargetLastUpdateTime
    THEN DELETE;

0 个答案:

没有答案