从SQL Server中获取组中的第一个或最多重复值

时间:2012-11-29 22:05:58

标签: sql sql-server

我有一个丑陋的表格,其代码和名称来自我无法控制的来源,就像这个(OriginalTable):

Code    | Name
--------------------
001-001 | Name1_a
001-002 | Name1_a
001-002 | Name1_b
001-003 | Name1_a
002-001 | Name2_a
002-001 | Name2_b
002-002 | Name2_a
003-001 | Name3
...

问题是我需要为每个代码(SmallCode)的前3位数字指定一个唯一的名称,如下表所示:

Id  | Code  | Name
--------------------
1   | 001   | NameX
2   | 002   | NameY
3   | 003   | NameZ

我想用于选择名称的标准是它应该是最重复的名称或每个SmallCode中的第一个名称。 例如,NameX是以001或第一个开头的所有代码中最重复的名称(在两种情况下都是Name1_a)。与NameY for 002和NameZ for 003相同。

现在我正在使用此查询:

select Substring(Code,1,3) as SmallCode, Code, Name
into #tmpCode
from OriginalTable

select SmallCode, Min(Code) as Code
into #tmpReducedCode
from #tmpCode
group by SmallCode

insert into ResultTable (Code, Name)
select a.SmallCode, a.Name
from #tmpCode a
    inner join #tmpReducedCode b
        on a.Code = b.Code

但这是我的结果,这是错误的,因为代码002-001有两个不同的名称(Name2_a,Name2_b)

1   | 001   | Name1_a
2   | 002   | Name2_a
3   | 002   | Name2_b
4   | 003   | Name3

所以问题是:我如何将OriginalTable分成两个表,为每个小代码选择最重复或首次出现的名称?

3 个答案:

答案 0 :(得分:2)

第一张表:

select Substring(Code,1,3) as SmallCode, Code, Name
into #tmpCode
from OriginalTable

select SmallCode, Name
into #tmpReducedCode
from (
    select SmallCode, Name, row_number() over (partition by SmallCode order by Total desc) rn
    from (
        select SmallCode, Name, count(*) Total
        from #tmpCode
        group by SmallCode, Name) x) y
where rn=1;

select distinct a.SmallCode, b.Name
from #tmpCode a
    inner join #tmpReducedCode b
        on left(a.Code,3) = b.SmallCode

答案 1 :(得分:1)

为每个代码运行子查询:

select distinct substring(Code,1,3) as "Code", 
    (select top 1 Name
    from OrginalTable tab2
    where substring(tab2.Code,1,3)=substring(tab1.Code,1,3)
    group by substring(Code,1,3), Name 
    order by count(Name) desc) as "Name"
from OrginalTable tab1;

答案 2 :(得分:1)

我认为最好的方法是使用窗口函数:

select cast(LEFT(code, 3) as int) as id,
       RIGHT(code, 3) as code,
       name
from (select cn.*, ROW_NUMBER() over (partition by code order by cnt desc) as seqnum
      from (select code, name, COUNT(*) as cnt
            from OriginalTable ot
            group by code, name
           ) cn
     ) cn
where seqnum = 1

这假设您使用的是SQL Server 2005或更新版本。