使用格式分割全名:{Last,First Middle}综合案例

时间:2014-08-05 15:30:23

标签: sql sql-server string data-manipulation data-quality

我的客户端将名称数据作为名称字符串发送给我,其中包含单个条目中的姓氏,名字和中间名称。我需要将它们拆分为LastName,FirstName和MiddleName。我在网上找到了一些脚本,但它们不能满足我的目的,因为它们要么(1)使用不同的格式,要么(2)不能很好地处理边缘情况。请参阅以下示例:

  1. 南丁格尔,佛罗伦萨 - >弗洛伦斯·南丁格尔
  2. 邦德,詹姆斯邦德 - >詹姆斯邦德邦德
  3. Abbott,Edwin A. - >埃德温A.阿博特
  4. 有人可以帮我写一个SQL Server脚本,将一个字符串拆分成我正在寻找的各种部分吗?

5 个答案:

答案 0 :(得分:3)

请注意以下事项:

  1. 始终请求规范化数据以确保最高数据质量。我试图列举最后,第一和中间名称组合的所有可能情况,但我确信我没有得到所有这些。
  2. 我的脚本需要格式:LastName @ DELIMITER1 @ DELIMITER2FirstName @ DELIMITER2MiddleName,但可以轻松更改其他格式。
  3. 此脚本不会像Dr.博士那样分隔,也不会处理后缀。
  4. 感谢MemKills关于我扩展的测试数据集的想法。
  5. >

    DECLARE @DELIMITER1 varchar(1), @DELIMITER2 varchar(1), @MAX_LENGTH int
    SET @DELIMITER1 = ','
    SET @DELIMITER2 = ' '
    SET @MAX_LENGTH = 50
    
    SELECT  [Name],
        SUBSTRING(Name,1,CHARINDEX(@DELIMITER1,Name) -1) AS LastName,                   -- Less one char for @DELIMITER1
        SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH) AS FirstAndMiddle,   -- Plus two for @DELIMITER1 and @DELIMITER2
        CASE 
            -- Middle name follows two-name first names like Mary Ann 
            WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH), @DELIMITER2, '')) > 0
                THEN SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+2, @MAX_LENGTH)
            ELSE NULL
        END AS MiddleName,
    
        CASE 
            -- Count the number of @DELIMITER2. Choose the string between the @DELIMITER1 and the final @DELIMITER2. 
            WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH), @DELIMITER2, '')) > 0
                Then SUBSTRING(Name, CHARINDEX(@DELIMITER1,Name)+ 2, 
                     (LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH))
                     - LEN(SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+2, @MAX_LENGTH))))
            ELSE SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 2,@MAX_LENGTH)
        END AS FirstName
    FROM 
    (
        SELECT  [Name] = 'Zzz, A' UNION ALL
        SELECT  'de Zzz, Aaa' UNION ALL
        SELECT  'Zzz, Aaaa' UNION ALL
        SELECT  'Zzz, A B' UNION ALL
        SELECT  'Zzz, Aaaa Bbbb' UNION ALL
        SELECT  'de Zzz, Aaaa' UNION ALL
        SELECT  'de Zzz, Aaaa B' UNION ALL
        SELECT  'van Zzz, Aaaa B' UNION ALL
        SELECT  'Yyy-Zzz, Aaaa B' UNION ALL
        SELECT  'd''Zzz, Aaaa B' UNION ALL
        SELECT  'Zzz, Aaaa Bbbb C' UNION ALL
        SELECT  'Zzz, Aaaa Bbbb Cccc'
    ) AS X
    

答案 1 :(得分:1)

尝试使用此代码。我发现它更有效率。请随时修改或改进它。感谢。


DECLARE @FullName VARCHAR(60),
        @FirstName VARCHAR(30),
        @LastName VARCHAR(30),

        @MiddleInitialPrep VARCHAR(60) = null,
        @MiddleInitial VARCHAR(1) = null

SET @FullName = 'Dr. John Edward Doe III'

-- NAME CLEAN UP TO REMOVE PREFIXES AND SUFFIXES
SET @FullName = REPLACE(@FullName, 'Mr. ', '')
SET @FullName = REPLACE(@FullName, 'Mr ', '')
SET @FullName = REPLACE(@FullName, 'Mrs. ', '')
SET @FullName = REPLACE(@FullName, 'Mrs ', '')
SET @FullName = REPLACE(@FullName, 'Ms. ', '')
SET @FullName = REPLACE(@FullName, 'Ms ', '')
SET @FullName = REPLACE(@FullName, 'Miss ', '')
SET @FullName = REPLACE(@FullName, 'Dr. ', '')
SET @FullName = REPLACE(@FullName, 'Dr ', '')
SET @FullName = REPLACE(@FullName, ' Jr.', '')
SET @FullName = REPLACE(@FullName, ' Jr', '')
SET @FullName = REPLACE(@FullName, ' Sr.', '')
SET @FullName = REPLACE(@FullName, ' Sr', '')
SET @FullName = REPLACE(@FullName, ' III', '')
SET @FullName = REPLACE(@FullName, ' II', '')

-- RETRIEVE FIRST AND LAST NAMES
SET @FirstName = LEFT(@FullName, NULLIF(CHARINDEX(' ', @FullName) - 1, -1))
SET @LastName = RIGHT(@FullName, ISNULL(NULLIF(CHARINDEX(' ', REVERSE(@FullName)) - 1, -1), LEN(@FullName)))

-- ISOLATE MIDDLE INITIAL
SET @MiddleInitialPrep = REPLACE(@FullName, @FirstName, '')
SET @MiddleInitialPrep = REPLACE(@MiddleInitialPrep, @LastName, '')
SET @MiddleInitial = REPLACE(@MiddleInitialPrep, ' ', '')

SELECT @FirstName First_Name, @MiddleInitial Middle_Initial, @LastName Last_Name

答案 2 :(得分:0)

以下代码适用于Last,First M名称字符串。用您的名称字符串列名替换“Name”。由于你有一个句号作为最后一个字符,当有一个中间的首字母时,你将在每一行(2,6和8)中用3代替2,并将“RIGHT(Name,1)”改为“RIGHT” (名称,2)“在第8行。

SELECT  SUBSTRING(Name, 1, CHARINDEX(',', Name) - 1) LastName ,
    CASE WHEN LEFT(RIGHT(Name, 2), 1) <> ' '
         THEN LTRIM(SUBSTRING(Name, CHARINDEX(',', Name) + 1, 99))
         ELSE LEFT(LTRIM(SUBSTRING(Name, CHARINDEX(',', Name) + 1, 99)),
                   LEN(LTRIM(SUBSTRING(Name, CHARINDEX(',', Name) + 1, 99)))
                   - 2)
    END FirstName ,
    CASE WHEN LEFT(RIGHT(Name, 2), 1) = ' ' THEN RIGHT(Name, 1)
         ELSE NULL
    END MiddleName

答案 3 :(得分:0)

很好的解决方案。我对我的情况进行了一些修改,其中分隔符是空格,中间名称只是中间的首字母(有时不存在)。以下解决方案甚至解析了多个间隔名称,例如:“Jo Ann Taylor Haynes”,没有中间的首字母。

SET @DELIMITER1 = ' '
SET @DELIMITER2 = ' '
SET @MAX_LENGTH = 50

SELECT  [Name],
    SUBSTRING(Name,1,CHARINDEX(@DELIMITER1,Name) -1) AS LastName,                  

    SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 1,@MAX_LENGTH) AS FirstAndMiddle,   
    CASE 

        WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 1,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 1,@MAX_LENGTH), @DELIMITER2, '')) = 1
            THEN SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+1, @MAX_LENGTH)
        ELSE NULL
    END AS MiddleName,

    CASE 

        WHEN LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 1,@MAX_LENGTH)) - LEN(REPLACE(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 1,@MAX_LENGTH), @DELIMITER2, '')) = 1
            Then SUBSTRING(Name, CHARINDEX(@DELIMITER1,Name)+ 1, 
                 (LEN(SUBSTRING(NAME, CHARINDEX(@DELIMITER1,Name)+ 1,@MAX_LENGTH))
                 - LEN(SUBSTRING(Name, LEN(Name) - CHARINDEX(@DELIMITER2, REVERSE(Name))+1, @MAX_LENGTH))))
        ELSE SUBSTRING(Name,CHARINDEX(@DELIMITER1,Name)+ 1,@MAX_LENGTH)
    END AS FirstName

答案 4 :(得分:-4)

select substr(
'santhosh kumar kota'
,1,
instr(
'santhosh kumar kota'
,' ' 
,1
,1)
) as fname
,substr('santhosh kumar kota'
,instr(
'santhosh kumar kota'
,' ' 
,1
,1)
,(instr(
'santhosh kumar kota'
,' ' 
,1
,2)-instr(
'santhosh kumar kota'
,' ' 
,1
,1)
)
)as mname
,substr('santhosh kumar kota'
,instr(
'santhosh kumar kota'
,' ' 
,1
,2)
,(length('santhosh kumar kota')+1)-instr(
'santhosh kumar kota'
,' ' 
,1
,2)
)as lname
from dual
/