我有一个丑陋的表格,其代码和名称来自我无法控制的来源,就像这个(OriginalTable):
Code | Name
--------------------
001-001 | Name1_a
001-002 | Name1_a
001-002 | Name1_b
001-003 | Name1_a
002-001 | Name2_a
002-001 | Name2_b
002-002 | Name2_a
003-001 | Name3
...
问题是我需要为每个代码(SmallCode)的前3位数字指定一个唯一的名称,如下表所示:
Id | Code | Name
--------------------
1 | 001 | NameX
2 | 002 | NameY
3 | 003 | NameZ
我想用于选择名称的标准是它应该是最重复的名称或每个SmallCode中的第一个名称。 例如,NameX是以001或第一个开头的所有代码中最重复的名称(在两种情况下都是Name1_a)。与NameY for 002和NameZ for 003相同。
现在我正在使用此查询:
select Substring(Code,1,3) as SmallCode, Code, Name
into #tmpCode
from OriginalTable
select SmallCode, Min(Code) as Code
into #tmpReducedCode
from #tmpCode
group by SmallCode
insert into ResultTable (Code, Name)
select a.SmallCode, a.Name
from #tmpCode a
inner join #tmpReducedCode b
on a.Code = b.Code
但这是我的结果,这是错误的,因为代码002-001有两个不同的名称(Name2_a,Name2_b)
1 | 001 | Name1_a
2 | 002 | Name2_a
3 | 002 | Name2_b
4 | 003 | Name3
所以问题是:我如何将OriginalTable分成两个表,为每个小代码选择最重复或首次出现的名称?
答案 0 :(得分:2)
第一张表:
select Substring(Code,1,3) as SmallCode, Code, Name
into #tmpCode
from OriginalTable
select SmallCode, Name
into #tmpReducedCode
from (
select SmallCode, Name, row_number() over (partition by SmallCode order by Total desc) rn
from (
select SmallCode, Name, count(*) Total
from #tmpCode
group by SmallCode, Name) x) y
where rn=1;
select distinct a.SmallCode, b.Name
from #tmpCode a
inner join #tmpReducedCode b
on left(a.Code,3) = b.SmallCode
答案 1 :(得分:1)
为每个代码运行子查询:
select distinct substring(Code,1,3) as "Code",
(select top 1 Name
from OrginalTable tab2
where substring(tab2.Code,1,3)=substring(tab1.Code,1,3)
group by substring(Code,1,3), Name
order by count(Name) desc) as "Name"
from OrginalTable tab1;
答案 2 :(得分:1)
我认为最好的方法是使用窗口函数:
select cast(LEFT(code, 3) as int) as id,
RIGHT(code, 3) as code,
name
from (select cn.*, ROW_NUMBER() over (partition by code order by cnt desc) as seqnum
from (select code, name, COUNT(*) as cnt
from OriginalTable ot
group by code, name
) cn
) cn
where seqnum = 1
这假设您使用的是SQL Server 2005或更新版本。