SQL Server:在一列中查找重复的子字符串

时间:2014-09-09 15:03:08

标签: sql sql-server

我在SQL Server中有一个客户端表。我试图找到在email_address列中找到重复项,但我只需要考虑部分列数据,所以是一个子字符串。实际上,我需要在记录中找到重复的域名。

我使用以下查询来查找完全相同的重复项(在整个字段中),但是如何修改它以考虑子字符串?

SELECT a.email_address, b.dupeCount, a.client_id
FROM tblClient a
INNER JOIN (
    SELECT email_address, COUNT(*) AS dupeCount
    FROM tblClient
    GROUP BY email_address
    HAVING COUNT(*) > 1
) b ON a.email_address = b.email_address

非常感谢!

2 个答案:

答案 0 :(得分:1)

试试这个:

declare @contact table (
  [client_id] [int] identity(1, 1)
  , [email]   [sysname]
  );
insert into @contact
        ([email])
values      (N'joe@billy_bobs.com'),
        (N'sally@beauty.com'),
        (N'george@billy_bobs.com');
with [stripper]
 as (select [client_id]
            , [email]
            , substring([email]
                        , charindex(N'@', [email], 0) + 1
                        , len([email])) as [domain_name]
     from   @contact),
 [duplicate_finder]
 as (select [client_id]
            , [domain_name]
            , row_number()
                over (
                  partition by [domain_name]
                  order by [domain_name]) as [sequence]
     from   [stripper])
select from [duplicate_finder]
where  [sequence] > 1;

答案 1 :(得分:0)

吉:

SELECT substr(email_address, 1, 2), count(*)
FROM tblClient 
group by 1