Question

我有一个带有重复项的表，我希望按ID进行分区，只选择信息最多的行（大多数字段包含值的行）。

+----+------+------+-------+---------+-------+
| ID | Name | City |  Zip  | Address | Phone |  
+----+------+------+-------+---------+-------+
|  1 | Joe  |      |       |         |       |  
|  1 | Joe  | DC   | 11111 |         |       |  
| 2  | Pete | NY   |       |         |       |  
|  2 | Pete | NY   | 10000 |         | 202-  |  
|  3 | Max  |      |       |         |       |  
| 3  | Max  |      |       |         |       |  
|  4 | Sean | MIA  |       |         |       |  
|  4 | Sean | MIA  |       | 1 blvd  |       |  
|  4 | Sean |      | 12345 |         | 305-  |  
|    |      |      |       |         |       |  
+----+------+------+-------+---------+-------+

这是我的目标：

+----+------+------+-------+---------+-------+---------+
| ID | Name | City |  Zip  | Address | Phone | Row_num |
+----+------+------+-------+---------+-------+---------+
|  1 | Joe  | DC   | 11111 |         |       |       1 |
|  2 | Pete | NY   | 10000 |         | 202-  |       1 |
| 3  | Max  |      |       |         |       |       1 |
|  4 | Sean | MIA  |       | 1 blvd  |       |       1 |
|    |      |      |       |         |       |         |
+----+------+------+-------+---------+-------+---------+

对于Joe来说，显然我希望第二行显示城市和邮政信息。

对于Pete我也想显示第二条记录，因为它包含更多信息。

对于Max而言，我选择哪一行并不重要，因为两个记录具有相同的值。

对于肖恩我可以采取第二或第三行，因为第二个记录有3个字段，其中包含值（名称，城市，地址），第三个记录也有三个填充字段（名称，邮编，电话）。因此，我想从肖恩那里选择哪条记录并不重要。

如何对表格进行分区并选择包含每个人最多信息的行？

Answer 1

如果列都是字符串，则可以使用apply简化逻辑：

select t.*
from (select t.*,
             row_number() over (partition by t.id order by v.cnt desc) as seqnum
      from t cross apply
           (select count(*)
            from (values (name), (city), (zip), (address), (phone)) v(col)
            where col is not null
           ) v(cnt)
    ) t
where seqnum = 1;

如果您想对空字符串进行调整，可以将where更改为where col is not null and col <> ''。

Answer 2

我认为这样做会

declare @t table (id int, name varchar(10), city varchar(10));
insert into @t values
    (1, 'Joe', null)  
  , (1, 'Joe', 'DC') 
  , (2, 'Pete', 'NY')
  , (2, null, 'NY')  
  , (3, null, 'TX') 
  , (5, 'Harry', null) 
  , (4, null, null);
select * 
from ( select * 
            , case when name is null then 0 else 1 end  
            + case when city is null then 0 else 1 end
              as rowCnt 
            , row_number() over (partition by id order by case when name is null then 0 else 1 end  
                                                        + case when city is null then 0 else 1 end desc) as rn
       from @t
     ) tt   
where tt.rn = 1 
order by tt.id

id          name       city       rowCnt      rn
----------- ---------- ---------- ----------- --------------------
1           Joe        DC         2           1
2           Pete       NY         2           1
3           NULL       TX         1           1
4           NULL       NULL       0           1
5           Harry      NULL       1           1

选择包含大部分信息的行与PARTITION BY一起使用

2 个答案: