我正在尝试将重复的条目组合在一个数据表中并为它们提供一个新的数字。
以下是一个示例数据集(runnable copy)
declare @tmpTable table
(ID Varchar(1),
First varchar(4),
Last varchar(5),
Phone varchar(13),
NonKeyField varchar(4))
insert into @tmpTable select 'A', 'John', 'Smith', '(555)555-1234', 'ASDF'
insert into @tmpTable select 'B', 'John', 'Smith', '(555)555-1234', 'GHJK'
insert into @tmpTable select 'C', 'Jane', 'Smith', '(555)555-1234', 'QWER'
insert into @tmpTable select 'D', 'John', 'Smith', '(555)555-1234', 'RTYU'
insert into @tmpTable select 'E', 'Bill', 'Blake', '(555)555-0000', 'BVNM'
insert into @tmpTable select 'F', 'Bill', 'Blake', '(555)555-0000', '%^&*'
insert into @tmpTable select 'G', 'John', 'Smith', '(555)555-1234', '!#RF'
select row_number() over (partition by First, Last, Phone order by ID) NewIDNum, *
from @tmpTable order by ID
现在它给了我结果
NewIDNum ID First Last Phone NonKeyField
-------------------- ---- ----- ----- ------------- -----------
1 A John Smith (555)555-1234 ASDF
2 B John Smith (555)555-1234 GHJK
1 C Jane Smith (555)555-1234 QWER
3 D John Smith (555)555-1234 RTYU
1 E Bill Blake (555)555-0000 BVNM
2 F Bill Blake (555)555-0000 %^&*
4 G John Smith (555)555-1234 !#RF
然而,这与我想要的相反,NewIDNum
会在找到密钥的新组合时重置其计数器。我希望所有相同的组合具有相同的ID。因此,如果它按照我想要的方式行事,我会得到以下结果
NewIDNum ID First Last Phone NonKeyField
-------------------- ---- ----- ----- ------------- -----------
1 A John Smith (555)555-1234 ASDF
1 B John Smith (555)555-1234 GHJK
2 C Jane Smith (555)555-1234 QWER
1 D John Smith (555)555-1234 RTYU
3 E Bill Blake (555)555-0000 BVNM
3 F Bill Blake (555)555-0000 %^&*
1 G John Smith (555)555-1234 !#RF
获得我想要的结果的正确方法是什么?
我没有在原帖中包含此要求:如果添加更多行,我需要NewIDNum
在此查询的后续运行中为现有行生成相同的数字(假设如果在ID列上完成订单,则所有新行将具有更高的ID“值”
因此,如果在后一个日期完成了以下工作
insert into @tmpTable select 'H', 'John', 'Smith', '(555)555-1234', '4321'
insert into @tmpTable select 'I', 'Jake', 'Jons', '(555)555-1234', '1234'
insert into @tmpTable select 'J', 'John', 'Smith', '(555)555-1234', '2345'
再次运行正确的查询将提供
NewIDNum ID First Last Phone NonKeyField
-------------------- ---- ----- ----- ------------- -----------
1 A John Smith (555)555-1234 ASDF
1 B John Smith (555)555-1234 GHJK
2 C Jane Smith (555)555-1234 QWER
1 D John Smith (555)555-1234 RTYU
3 E Bill Blake (555)555-0000 BVNM
3 F Bill Blake (555)555-0000 %^&*
1 G John Smith (555)555-1234 !#RF
1 H John Smith (555)555-1234 4321
4 I Jake Jons (555)555-1234 1234
1 J John Smith (555)555-1234 2345
答案 0 :(得分:6)
您可以使用dense_rank()
:
dense_rank() over (order by First, Last, Phone) as NewIDNum
在回复您的评论时,您可以使用相同的Id
组合对每组行的旧(First, Last, Phone)
列的最小值进行排序:
select *
from (
select dense_rank() over (order by min_id) as new_id
, *
from (
select min(id) over (
partition by First, Last, Phone) as min_id
, *
from @tmpTable
) as sub1
) as sub3
order by
new_id
答案 1 :(得分:1)
基于@Andomar的原始答案 - 这将适用于您的更新要求(尽管这不太可能很好地扩展)
select
DENSE_RANK() over (ORDER BY IdRank, First, Last, Phone) AS NewIDNum,
ID,
First,
Last,
Phone,
NonKeyField
from
(
select
MIN(ID) OVER (PARTITION BY First, Last, Phone) as IdRank,
*
from
@tmpTable
) as x
order by
ID;
答案 2 :(得分:0)
感谢Andomar's answer作为跳跃点我自己解决了
select sub1.rn, tt.*
from @tmpTable tt
inner join (
select row_number() over (order by min(ID)) as rn, first, last, phone
from @tmpTable
group by first, last, phone
) as sub1 on tt.first = sub1.first and tt.last = sub1.last and tt.phone = sub1.phone
这会产生
rn ID First Last Phone NonKeyField
-------------------- ---- ----- ----- ------------- -----------
1 A John Smith (555)555-1234 ASDF
1 B John Smith (555)555-1234 GHJK
1 D John Smith (555)555-1234 RTYU
1 G John Smith (555)555-1234 !#RF
1 H John Smith (555)555-1234 4321
1 J John Smith (555)555-1234 2345
2 C Jane Smith (555)555-1234 QWER
3 E Bill Blake (555)555-0000 BVNM
3 F Bill Blake (555)555-0000 %^&*
4 I Jake Jons (555)555-1234 1234
查看SQL执行计划,Adnomar的答案将比我的更快地运行更大的数据集。 (53%执行时间VS 47%执行时间彼此相邻并且“包含实际执行计划”已经选中。
答案 3 :(得分:-1)
这应该有效
select dense_rank() over (order by First, Last, Phone) NewIDNum, *
from @tmpTable order by ID