我有一张表来自不同数据源的客户。有SSN,许可证#和一些唯一ID,但并非所有源都具有相同的ID。我想比较ID列(SSN,许可证,SystemID)上的记录,并在找到同一个人时分配映射ID。
我假设我可以使用CTE但不知道从哪里开始。仍在努力学习我在SQL中的方式。任何帮助将不胜感激。感谢。
表格如下:
Source|RowID|SSN |License|SystemID
A |1 |SSN1|Lic111 |
A |2 | | |Sys666
B |3 |SSN2| |Sys777
C |4 |SSN1| |
D |5 | |Lic333 |
D |6 | |Lic333 |Sys666
E |7 | | |Sys777
结果(添加了MapCustomerID)
Source|RowID|SSN |License|SystemID|MapCustomerID
A |1 |SSN1|Lic111 | |1
A |2 | | |Sys666 |2
B |3 |SSN2| |Sys777 |3
C |4 |SSN1| | |1
D |5 | |Lic999 | |4
D |6 | |Lic333 |Sys666 |2
E |7 | | |Sys777 |3
答案 0 :(得分:1)
这可能是解决问题的“足够好”的方法。
沿着三个维度中的每一个,找到该维度的最小行ID(具有NULL的特殊处理)。然后,总体客户标识符是这三个ID中的最小值。要使其顺序无间隙,请使用dense_rank()
。
with ids as (
select t.*,
(case when SSN is not null
then min(RowId) over (partition by SSN)
end) as SSN_id,
(case when License is not null
then min(RowId) over (partition by License)
end) as License_id,
(case when SystemId is not null
then min(RowId) over (partition by SystemId)
end)as SystemId_id
from t
),
leastid as (
select ids.*,
(case when SSN_Id <= coalesce(License_Id, SSN_Id) and
SSN_Id <= coalesce(SystemId_id, SSN_Id)
then SSN_Id
when License_Id <= coalesce(SystemId_id, License_Id)
then License_Id
else SystemId_id
end) as LeastId
from ids
)
select Source, RowID, SSN, License, SystemID,
dense_rank(LeastId) over (order by LeastId) as MapCustomerId
from LeastIds;
这不是一个完整的解决方案,但它适用于您的数据。它在以下情况下不起作用:
A |1 |SSN1|Lic111 | |1
A |2 |SSN1| |Sys666 |2
A |3 | | |Sys666 |2
因为这需要两个“跳”。
当我过去遇到这种情况时,我在表格中创建了额外的列,并重复使用update
来获取不同维度的最小ID。这种迭代可以快速连接不同的部分。可能写一个递归CTE来做同样的事情。但是,上面更简单的解决方案可以解决您的问题。
编辑:
因为我之前遇到过这个问题,所以我想提出一个单一的查询解决方案(而不是迭代更新)。这可以使用递归CTE。以下代码似乎有效:
with t as (
select 'A' as source, 1 as RowId, 'SSN1' as SSN, 'Lic111' as License, 'ABC' as SystemId union all
select 'A', 2, 'SSN1', NULL, 'Sys666' union all
select 'A', 3, NULL, NULL, 'Sys666' union all
select 'A', 4, NULL, 'Lic222', 'Sys666' union all
select 'A', 5, NULL, 'Lic222', NULL union all
select 'A', 6, NULL, 'Lic444', NULL
),
first as (
select t.*,
(select min(RowId)
from t t2
where t2.SSN = t.SSN or
t2.License = t.License or
t2.SystemId = t.SystemId
) as minrowid
from t
),
cte as (
select rowid, minrowid
from first
union all
select cte.rowid, first.minrowid
from cte join
first
on cte.minrowid = first.rowid and
cte.minrowid > first.minrowid
),
lookup as (
select rowid, min(minrowid) as minrowid,
dense_rank() over (order by min(minrowid)) as MapCustomerId
from cte
group by rowid
)
select t.*, lookup.MapCustomerId
from t join
lookup
on t.rowid = lookup.rowid;