T-SQL:解析名称以忽略空格和中间名首字母

时间:2012-10-29 13:54:22

标签: sql sql-server-2008 tsql parsing

我的维护不善的数据库包含员工信息。人力资源部门要求提供一份报告,其中列出了与保险范围相关联的员工姓名与保险单上的姓名不匹配的情况。

两个表中名称的格式不一致。它始终是姓氏,然后是名字,但您可能会在一个名为Steven J. Smith的虚构员工的表中看到以下任何内容:

  1. 史密斯,史蒂文
  2. 史密斯,史蒂芬
  3. Smith,Steven J。
  4. Smith,Steven J。
  5. 我需要运行一个查询,查找EMPLOYEE.EMP_NAME<>的实例INSURANCE.SUBSCRIBER_NAME虽然允许上面显示的名称格式的差异(即拿起那个“史密斯,史蒂文J.”和“史密斯,史蒂文”(可能)是同一个人并且igonring他们)。

    SELECT 
      EMPLOYEE.EMP_NO
    , EMPLOYEE.EMP_NAME
    , INSURANCE.SUBSCRIBER_NAME
    , INSURANCE.PAYOR_NAME
    
    FROM EMPLOYEE
         INNER JOIN INSURANCE ON EMPLOYEE.EMP_NO = INSURANCE.EMP_NO
    
    WHERE EMPLOYEE.EMP_NAME <> INSURANCE.SUBSCRIBER_NAME
    

    我知道我想做一个子字符串来忽略中间的首字母,但是我如何考虑忽略逗号后面是否有空格?

4 个答案:

答案 0 :(得分:0)

你可以简单地replace逗号

 WHERE replace (EMPLOYEE.EMP_NAME,',','') <> replace (INSURANCE.SUBSCRIBER_NAME,',','')

找出大多数不匹配......

;with cE as 
(select 
     EMP_NO, 
     REPLACE(REPLACE(REPLACE(EMP_NAME,',',''),' ',''),'.','') as namekey 
from EMPLOYEE),
ci as 
(select 
     EMP_NO, 
     REPLACE(REPLACE(REPLACE(SUBSCRIBER_NAME,',',''),' ',''),'.','') as namekey 
from INSURANCE)
select *
from ce
    inner join ci on ce.EMP_NO = ci.EMP_NO
where
     not
     (
    (LEN(ce.namekey)< LEN(ci.namekey) and ci.namekey like ce.namekey+'%')
        or
    (LEN(ce.namekey)>= LEN(ci.namekey) and ce.namekey like ci.namekey+'%')
     )

答案 1 :(得分:0)

为什么不用REPLACE删除所有逗号和空格?

WHERE REPLACE(REPLACE(EMPLOYEE.EMP_NAME,' ',''),',','') <> REPLACE(REPLACE(INSURANCE.SUBSCRIBER_NAME,' ',''),',','')

答案 2 :(得分:0)

您可以在逗号后删除空格,然后删除缩写

declare @Temp table (Name nvarchar(128))

insert into @Temp
select 'Smith, Steven' union all
select 'Smith,Steven' union all
select 'Smith, Steven J.' union all
select 'Smith,Steven J.'

select 
    case
        when N1.Name like '% %' then left(N1.Name, charindex(' ', N1.Name))
        else N1.Name
    end as Name_New,
    T.Name
from @Temp as T
    outer apply (select replace(T.Name, ', ', ',') as Name) as N1

答案 3 :(得分:0)

谢谢,你的答案帮了很多忙。我最终将名字剪切成[lastname] [firstname],没有空格,如果它在那里则切断中间的首字母。这是最终消除绝大多数相同名称匹配的作用:

((CASE
WHEN CHARINDEX(' ',REPLACE(REPLACE(EMPLOYEE.EMP_NAME,', ',''),',','')) = 0
THEN UPPER(REPLACE(REPLACE(EMPLOYEE.EMP_NAME,', ',''),',',''))
ELSE UPPER(LEFT(REPLACE(REPLACE(EMPLOYEE.EMP_NAME,', ',''),',',''),CHARINDEX(' ',REPLACE(REPLACE(EMPLOYEE.EMP_NAME,', ',''),',',''))))
END) <> 
(CASE
WHEN CHARINDEX(' ',REPLACE(REPLACE(INSURANCE.SUBSCRIBER_NAME
,', ',''),',','')) = 0
THEN UPPER(REPLACE(REPLACE(INSURANCE.SUBSCRIBER_NAME
,', ',''),',',''))
ELSE UPPER(LEFT(REPLACE(REPLACE(INSURANCE.SUBSCRIBER_NAME
,', ',''),',',''),CHARINDEX(' ',REPLACE(REPLACE(INSURANCE.SUBSCRIBER_NAME
,', ',''),',',''))))
END))