我在SQL Server中有一个客户端表。我试图找到在email_address列中找到重复项,但我只需要考虑部分列数据,所以是一个子字符串。实际上,我需要在记录中找到重复的域名。
我使用以下查询来查找完全相同的重复项(在整个字段中),但是如何修改它以考虑子字符串?
SELECT a.email_address, b.dupeCount, a.client_id
FROM tblClient a
INNER JOIN (
SELECT email_address, COUNT(*) AS dupeCount
FROM tblClient
GROUP BY email_address
HAVING COUNT(*) > 1
) b ON a.email_address = b.email_address
非常感谢!
答案 0 :(得分:1)
试试这个:
declare @contact table (
[client_id] [int] identity(1, 1)
, [email] [sysname]
);
insert into @contact
([email])
values (N'joe@billy_bobs.com'),
(N'sally@beauty.com'),
(N'george@billy_bobs.com');
with [stripper]
as (select [client_id]
, [email]
, substring([email]
, charindex(N'@', [email], 0) + 1
, len([email])) as [domain_name]
from @contact),
[duplicate_finder]
as (select [client_id]
, [domain_name]
, row_number()
over (
partition by [domain_name]
order by [domain_name]) as [sequence]
from [stripper])
select from [duplicate_finder]
where [sequence] > 1;
答案 1 :(得分:0)
吉:
SELECT substr(email_address, 1, 2), count(*)
FROM tblClient
group by 1