我有一个简单的表格,如下所示:
ClientID ItemID
1 1
1 2
1 3
2 1
2 2
3 3
4 3
5 1
5 2
5 4
5 5
其中两列组合成为主键。我现在的任务是识别分配给ClientID的所有唯一ItemID集。所以在我的例子中,集合将是:
ItemIDs 1,2,3 (used by ClientID 1)
ItemIDs 1,2 (used by ClientID 2)
ItemIDs 3 (used by ClientIDs 3 and 4)
ItemIDs 1,2,4,5 (used by ClientID 5)
理想情况下,输出将是两个表:
SetID ItemID
1 1
1 2
1 3
2 1
2 2
3 3
4 1
4 2
4 4
4 5
ClientID SetID
1 1
2 2
3 3
4 3
5 4
其中SetID将是一个在其他地方使用的新字段。
目前我识别唯一集合的方法包括使用游标为每个ClientID构建一个有序ItemID的字符串,然后比较输出以获取唯一字符串,最后解析回来。它写得很快但感觉很糟糕。
我确信必须有比这更好的方法。有什么想法吗?
答案 0 :(得分:1)
-- Table to hold test data
declare @T table
(
ClientID int,
ItemID int
)
insert into @T values
(1, 1),(1, 2),(1, 3),
(2, 1),(2, 2),
(3, 3),(4, 3),
(5, 1),(5, 2),(5, 4),(5, 5)
-- Temp table that will hold the generated set's
declare @Tmp table
(
ClientID int,
ItemIDSet varchar(max),
SetID int
)
-- Query the sets using rank() over a comma separated ItemIDSet
insert into @Tmp
select ClientID,
ItemIDSet,
rank() over(order by ItemIDSet) as SetID
from (
select T1.ClientID,
stuff((select ','+cast(T2.ItemID as varchar(10))
from @T as T2
where T1.ClientID = T2.ClientID
order by T2.ItemID
for xml path('')), 1, 1, '') as ItemIDSet
from @T as T1
group by T1.ClientID
) as T
-- Get ClientID and SetID from @Tmp
select ClientID,
SetID
from @Tmp
order by ClientID
-- Get SetID and ItemID from @Tmp
select SetID,
T3.N.value('.', 'int') as ItemID
from (
select distinct
SetID,
'<i>'+replace(ItemIDSet, ',', '</i><i>')+'</i>' as ItemIDSet
from @Tmp
) as T1
cross apply
(
select cast(T1.ItemIDSet as xml) as ItemIDSet
) as T2
cross apply T2.ItemIDSet.nodes('i') as T3(N)
结果:
ClientID SetID
----------- -----------
1 2
2 1
3 4
4 4
5 3
SetID ItemID
----------- -----------
1 1
1 2
2 1
2 2
2 3
3 1
3 2
3 4
3 5
4 3
SetID的值与您提供的输出中的值并不完全相同,但我认为这不是一个大问题。 SetID是由ItemIDSet排序的等级函数rank() over(order by ItemIDSet)
生成的。
把它换成spin。