情况如下:
我正在从几个有关公司的来源收集数据,但让我们为您的客户提供方法
对于同一个客户,我可以在同一天或另一天获得多行。我希望使用SCD2来保存历史记录,但是有些资源并未向我提供所有字段的数据。我可以说'N / A'或NULL i 我想这样做
a)如果除日期之外的两行或更多行相同,则应该产生一行具有最早的日期 b)如果更改了一个或多个字段,请创建一个新的scd2行,将更改日期更改为startdate。 c)如果b)中新行中的一个或多个字段已从合法值更改为“N / A”,则它应具有这些字段的最新合法值(来自上一行)
我正在使用sql server和t-sql
我希望我能够清楚地解释清楚: - )
再次感谢
编辑(来自评论):
CustomerHistoryId |CustomerNum |CustomerName |Planet |ChangeDate -------------------------------------------------------------------------------------------------
1 |101 |Anakin Skywalker |Tatooine |14.03.2015 15:41
2 |102 |Yoda |Coruscant |14.03.2015 15:41
3 |103 |Obi-Wan Kenobi |Coruscant |24.03.2015 15:41
4 |102 |Yoda |Coruscant |29.03.2015 15:41
5 |102 |Yoda |NULL |03.04.2015 15:41
6 |102 |Yoda |NULL |04.04.2015
7 |103 |Obi-Wan Kenobi |Degobah |08.04.2015 15:41
8 |102 |Master Yoda |Tatooine |09.04.2015 15:41
9 |102 |NULL |Tatooine |10.04.2015 15:41
10 |102 |Master Yoda |Tatooine |11.04.2015 15:41
最终结果:
CustomerHistoryId |CustomerNum |CustomerName |Planet |ChangeDate
1 |101 |Anakin Skywalker |Tatooine |14.03.2015 15:41
2 |102 |Yoda |Coruscant |14.03.2015 15:41
3 |103 |Obi-Wan Kenobi |Coruscant |24.03.2015 15:41
7 |103 |Obi-Wan Kenobi |Degobah |08.04.2015 15:41
8 |102 |Master Yoda |Tatooine |09.04.2015 15:41
答案 0 :(得分:0)
据我所知,您希望忽略具有NULL
值的行,然后忽略重复项,忽略日期。假设id和日期以相同的顺序分配,您可以使用聚合执行此操作:
select min(CustomerHistoryId) as CustomerHistoryId,
CustomerNum, CustomerName, Planet, min(ChangeDate) as ChangeDate
from t
where CustomerName is not null and Planet is not null
group by CustomerNum, CustomerName, Planet;
答案 1 :(得分:0)
| customerhistoryid | customernum | customername | planet | changedate |
|-------------------|-------------|------------------|-----------|-------------------------|
| 1 | 101 | Anakin Skywalker | Tatooine | March, 16 2015 22:18:34 |
| 2 | 102 | Yoda | Coruscant | March, 16 2015 00:42:34 |
| 3 | 103 | Obi-Wan Kenobi | Coruscant | March, 26 2015 22:18:34 |
| 4 | 102 | Yoda | Coruscant | March, 16 2015 03:06:34 |
| 5 | 102 | Yoda | (null) | March, 16 2015 05:30:34 |
| 6 | 102 | Yoda | Basic | March, 16 2015 07:54:34 |
| 7 | 103 | Obi-Wan Kenobi | Degobah | April, 10 2015 22:18:34 |
| 8 | 102 | Master Yoda | Tatooine | April, 11 2015 00:42:34 |
| 9 | 102 | (null) | Tatooine | April, 11 2015 03:06:34 |
| 10 | 102 | (null) | Tatooine2 | April, 11 2015 07:54:34 |
| 11 | 102 | Master Yoda | Degobah2 | April, 13 2015 22:18:34 |
数据不是“干净”,例如你有“Yoda”和“Master Yoda”都有相同的customernum
值。所以真的应该有一个单独的表,每个customernum
包含正确名称的唯一行。但是这不存在。
所以这是一种方法(还有更多的可能性)
select
MIN(CustomerHistoryId), CustomerNum, CustomerName, Planet, MIN(ChangeDate)
from (
select
t.CustomerHistoryId
, t.CustomerNum
, COALESCE(t.CustomerName,
( select top (1)
t2.CustomerName
from t t2
where t.CustomerName IS NULL
and t2.CustomerName IS NOT NULL
and t2.CustomerNum = t.CustomerNum
and t2.ChangeDate < t.ChangeDate
order by t2.ChangeDate DESC
)
) AS CustomerName
, COALESCE(t.Planet,
( select top (1)
t2.Planet
from t t2
where t.Planet IS NULL
and t2.Planet IS NOT NULL
and t2.ChangeDate < t.ChangeDate
order by t2.ChangeDate DESC
)
) AS Planet
, t.ChangeDate
from t
) dt
group by
CustomerNum, CustomerName, Planet
order by
CustomerNum, MIN(CustomerHistoryId)
;
这是一种相当通用的方法,但您可以使用OUTER APPLY
而不是相关的子查询。
从那个查询我得到了这个结果:
| min | customernum | customername | planet | min |
|-----|-------------|------------------|-----------|-------------------------|
| 1 | 101 | Anakin Skywalker | Tatooine | March, 16 2015 22:18:34 |
| 2 | 102 | Yoda | Coruscant | March, 16 2015 00:42:34 |
| 6 | 102 | Yoda | Basic | March, 16 2015 07:54:34 |
| 8 | 102 | Master Yoda | Tatooine | April, 11 2015 00:42:34 |
| 10 | 102 | Master Yoda | Tatooine2 | April, 11 2015 07:54:34 |
| 11 | 102 | Master Yoda | Degobah2 | April, 13 2015 22:18:34 |
| 3 | 103 | Obi-Wan Kenobi | Coruscant | March, 26 2015 22:18:34 |
| 7 | 103 | Obi-Wan Kenobi | Degobah | April, 10 2015 22:18:34 |
我使用this sqlfiddle(在Postgres中,因为MSSQL当时没有工作)