Question

情况如下：

我正在从几个有关公司的来源收集数据，但让我们为您的客户提供方法

对于同一个客户，我可以在同一天或另一天获得多行。我希望使用SCD2来保存历史记录，但是有些资源并未向我提供所有字段的数据。我可以说'N / A'或NULL i 我想这样做

a）如果除日期之外的两行或更多行相同，则应该产生一行具有最早的日期 b）如果更改了一个或多个字段，请创建一个新的scd2行，将更改日期更改为startdate。 c）如果b）中新行中的一个或多个字段已从合法值更改为“N / A”，则它应具有这些字段的最新合法值（来自上一行）

我正在使用sql server和t-sql

我希望我能够清楚地解释清楚： - ）

再次感谢

编辑（来自评论）：

CustomerHistoryId   |CustomerNum    |CustomerName   |Planet |ChangeDate --------------------------------------------------------------------------------‌----------------- 
1   |101    |Anakin Skywalker   |Tatooine   |14.03.2015 15:41
2   |102    |Yoda   |Coruscant  |14.03.2015 15:41
3   |103    |Obi-Wan Kenobi |Coruscant  |24.03.2015 15:41
4   |102    |Yoda |Coruscant    |29.03.2015 15:41
5   |102    |Yoda   |NULL   |03.04.2015 15:41
6   |102    |Yoda   |NULL   |04.04.2015 
7   |103    |Obi-Wan Kenobi |Degobah    |08.04.2015 15:41
8   |102    |Master Yoda    |Tatooine   |09.04.2015 15:41
9   |102    |NULL   |Tatooine   |10.04.2015 15:41
10  |102    |Master Yoda    |Tatooine   |11.04.2015 15:41

最终结果：

CustomerHistoryId   |CustomerNum    |CustomerName   |Planet |ChangeDate
1   |101    |Anakin Skywalker   |Tatooine   |14.03.2015 15:41
2   |102    |Yoda   |Coruscant  |14.03.2015 15:41
3   |103    |Obi-Wan Kenobi |Coruscant  |24.03.2015 15:41
7   |103    |Obi-Wan Kenobi |Degobah    |08.04.2015 15:41
8   |102 |Master Yoda   |Tatooine   |09.04.2015 15:41

Answer 1

据我所知，您希望忽略具有NULL值的行，然后忽略重复项，忽略日期。假设id和日期以相同的顺序分配，您可以使用聚合执行此操作：

select min(CustomerHistoryId) as CustomerHistoryId,
       CustomerNum, CustomerName, Planet, min(ChangeDate) as ChangeDate
from t
where CustomerName is not null and Planet is not null
group by CustomerNum, CustomerName, Planet;

Answer 2

| customerhistoryid | customernum |     customername |    planet |              changedate |
|-------------------|-------------|------------------|-----------|-------------------------|
|                 1 |         101 | Anakin Skywalker |  Tatooine | March, 16 2015 22:18:34 |
|                 2 |         102 |             Yoda | Coruscant | March, 16 2015 00:42:34 |
|                 3 |         103 |   Obi-Wan Kenobi | Coruscant | March, 26 2015 22:18:34 |
|                 4 |         102 |             Yoda | Coruscant | March, 16 2015 03:06:34 |
|                 5 |         102 |             Yoda |    (null) | March, 16 2015 05:30:34 |
|                 6 |         102 |             Yoda |     Basic | March, 16 2015 07:54:34 |
|                 7 |         103 |   Obi-Wan Kenobi |   Degobah | April, 10 2015 22:18:34 |
|                 8 |         102 |      Master Yoda |  Tatooine | April, 11 2015 00:42:34 |
|                 9 |         102 |           (null) |  Tatooine | April, 11 2015 03:06:34 |
|                10 |         102 |           (null) | Tatooine2 | April, 11 2015 07:54:34 |
|                11 |         102 |      Master Yoda |  Degobah2 | April, 13 2015 22:18:34 |

数据不是“干净”，例如你有“Yoda”和“Master Yoda”都有相同的customernum值。所以真的应该有一个单独的表，每个customernum包含正确名称的唯一行。但是这不存在。

所以这是一种方法（还有更多的可能性）

select
   MIN(CustomerHistoryId), CustomerNum, CustomerName, Planet, MIN(ChangeDate)
from (
      select
        t.CustomerHistoryId
      , t.CustomerNum
      , COALESCE(t.CustomerName,
                   ( select top (1)
                        t2.CustomerName
                     from t t2
                     where t.CustomerName IS NULL 
                     and t2.CustomerName IS NOT NULL 
                     and t2.CustomerNum = t.CustomerNum
                     and t2.ChangeDate < t.ChangeDate
                     order by t2.ChangeDate DESC
                    )
                 ) AS CustomerName
      , COALESCE(t.Planet,
                   ( select top (1)
                        t2.Planet
                     from t t2
                     where t.Planet IS NULL 
                     and t2.Planet IS NOT NULL 
                     and t2.ChangeDate < t.ChangeDate
                     order by t2.ChangeDate DESC
                    )
                 ) AS Planet
      , t.ChangeDate
      from t
   ) dt
group by
   CustomerNum, CustomerName, Planet
order by
   CustomerNum, MIN(CustomerHistoryId)
;

这是一种相当通用的方法，但您可以使用OUTER APPLY而不是相关的子查询。

从那个查询我得到了这个结果：

| min | customernum |     customername |    planet |                     min |
|-----|-------------|------------------|-----------|-------------------------|
|   1 |         101 | Anakin Skywalker |  Tatooine | March, 16 2015 22:18:34 |
|   2 |         102 |             Yoda | Coruscant | March, 16 2015 00:42:34 |
|   6 |         102 |             Yoda |     Basic | March, 16 2015 07:54:34 |
|   8 |         102 |      Master Yoda |  Tatooine | April, 11 2015 00:42:34 |
|  10 |         102 |      Master Yoda | Tatooine2 | April, 11 2015 07:54:34 |
|  11 |         102 |      Master Yoda |  Degobah2 | April, 13 2015 22:18:34 |
|   3 |         103 |   Obi-Wan Kenobi | Coruscant | March, 26 2015 22:18:34 |
|   7 |         103 |   Obi-Wan Kenobi |   Degobah | April, 10 2015 22:18:34 |

我使用this sqlfiddle（在Postgres中，因为MSSQL当时没有工作）

如何从多行

2 个答案: