我有一个带有重复项的表,我希望按ID进行分区,只选择信息最多的行(大多数字段包含值的行)。
+----+------+------+-------+---------+-------+
| ID | Name | City | Zip | Address | Phone |
+----+------+------+-------+---------+-------+
| 1 | Joe | | | | |
| 1 | Joe | DC | 11111 | | |
| 2 | Pete | NY | | | |
| 2 | Pete | NY | 10000 | | 202- |
| 3 | Max | | | | |
| 3 | Max | | | | |
| 4 | Sean | MIA | | | |
| 4 | Sean | MIA | | 1 blvd | |
| 4 | Sean | | 12345 | | 305- |
| | | | | | |
+----+------+------+-------+---------+-------+
这是我的目标:
+----+------+------+-------+---------+-------+---------+
| ID | Name | City | Zip | Address | Phone | Row_num |
+----+------+------+-------+---------+-------+---------+
| 1 | Joe | DC | 11111 | | | 1 |
| 2 | Pete | NY | 10000 | | 202- | 1 |
| 3 | Max | | | | | 1 |
| 4 | Sean | MIA | | 1 blvd | | 1 |
| | | | | | | |
+----+------+------+-------+---------+-------+---------+
对于Joe来说,显然我希望第二行显示城市和邮政信息。
对于Pete我也想显示第二条记录,因为它包含更多信息。
对于Max而言,我选择哪一行并不重要,因为两个记录具有相同的值。
对于肖恩我可以采取第二或第三行,因为第二个记录有3个字段,其中包含值(名称,城市,地址),第三个记录也有三个填充字段(名称,邮编,电话)。因此,我想从肖恩那里选择哪条记录并不重要。
如何对表格进行分区并选择包含每个人最多信息的行?
答案 0 :(得分:3)
如果列都是字符串,则可以使用apply
简化逻辑:
select t.*
from (select t.*,
row_number() over (partition by t.id order by v.cnt desc) as seqnum
from t cross apply
(select count(*)
from (values (name), (city), (zip), (address), (phone)) v(col)
where col is not null
) v(cnt)
) t
where seqnum = 1;
如果您想对空字符串进行调整,可以将where
更改为where col is not null and col <> ''
。
答案 1 :(得分:0)
我认为这样做会
declare @t table (id int, name varchar(10), city varchar(10));
insert into @t values
(1, 'Joe', null)
, (1, 'Joe', 'DC')
, (2, 'Pete', 'NY')
, (2, null, 'NY')
, (3, null, 'TX')
, (5, 'Harry', null)
, (4, null, null);
select *
from ( select *
, case when name is null then 0 else 1 end
+ case when city is null then 0 else 1 end
as rowCnt
, row_number() over (partition by id order by case when name is null then 0 else 1 end
+ case when city is null then 0 else 1 end desc) as rn
from @t
) tt
where tt.rn = 1
order by tt.id
id name city rowCnt rn
----------- ---------- ---------- ----------- --------------------
1 Joe DC 2 1
2 Pete NY 2 1
3 NULL TX 1 1
4 NULL NULL 0 1
5 Harry NULL 1 1