SQL Server 2014,如何从字符串结尾提取单元单元值

时间:2018-06-19 20:09:54

标签: sql-server string text-extraction

我需要比较部分地址。我将地址字符串分解为较小的部分,例如门牌号码,街道方向,街道名称,单位编号,街道类型。我已经准备好大多数零件,但是在提取单元号或字符串末尾的字母时遇到了麻烦。

我发现了以下情况:

3015 NESSLING ST            < no unit value, no need to do anything
6941 CHESTER DR # H         < alpha value with space after the # sign need the H
6941 CHESTER DR #B          < alpha value with no space after the # sign, need the B
7203 MID TOWN RD # 209      < numeric value after the # sign and space in between, need the 209
3100 LAKE MENDOTA DR #802   < numeric value after the # sign and no space in between, need the 802
6949 CHESTER DR UNIT C      < non numeric value after word UNIT, need the C
6949 CHESTER DR UNITC       < alpha value after word UNIT with no space in between, need the C
7203 MID TOWN RD UNIT 207   < numeric value after the word UNIT with space in between, need the 207
7203 MID TOWN RD UNIT207    < numeric value after the word UNIT no space in between, need the 207

我认为,查看需要更正其地址的记录,这些都是我所看到的情况。

是否可以使用SQL检索上面指定的值?

我尝试了以下操作:

DECLARE @textval NVARCHAR(30)

SET @textval = '7203 MID TOWN RD UNIT207'

SELECT SUBSTRING(@textval,PATINDEX('% [0-9]%',@textval)+1,PATINDEX('%[0-9],%',@textval+ ',')-PATINDEX('% [0-9]%',@textval))


3015 NESSLING ST            - works, returns blank
6941 CHESTER DR # H         - does not work, returns blank
6941 CHESTER DR #B          - does not work, returns blank
7203 MID TOWN RD # 209      - works, returns 209
3100 LAKE MENDOTA DR #802   - does not work, returns 3100 LAKE MENDOTA DR #802
6949 CHESTER DR UNIT C      - does not work, returns blank, should return C
6949 CHESTER DR UNITC       - does not work, returns blank, should return C
7203 MID TOWN RD UNIT 207   - works, returns 207
7203 MID TOWN RD UNIT 207   - does not work, returns 7203 MID TOWN RD UNIT207

任何帮助都会很棒。

非常感谢。

使用Ryan提出的解决方案进行更新:

DECLARE @textval NVARCHAR(30)

SET @textval = '7203 MID TOWN RD UNIT207'


SELECT
    CASE
        WHEN CHARINDEX('#', @textval) > -1 THEN LTRIM(SUBSTRING(@textval, CHARINDEX('#', @textval) + 1, LEN(@textval)-(CHARINDEX('#', @textval)+1)))
        WHEN CHARINDEX('UNIT', @textval) > -1 THEN LTRIM(SUBSTRING(@textval, CHARINDEX('UNIT', @textval) + 1, LEN(@textval)-(CHARINDEX('UNIT', @textval)+1)))
        ELSE ''
    END AS [UnitValue];

/*
3015 NESSLING ST            - does not work, returns 3015 NESSLING S
6941 CHESTER DR # H         - does not work, returns blank
6941 CHESTER DR #B          - does not work, returns blank
7203 MID TOWN RD # 209      - does not work, returns 20
3100 LAKE MENDOTA DR #802   - does not work, returns 80
6949 CHESTER DR UNIT C      - does not work, returns 6949 CHESTER DR UNIT
6949 CHESTER DR UNITC       - does not work, returns 6949 CHESTER DR UNIT
7203 MID TOWN RD UNIT 207   - works, returns 7203 MID TOWN RD UNIT 20
7203 MID TOWN RD UNIT207    - does not work, returns 7203 MID TOWN RD UNIT20
*/

3 个答案:

答案 0 :(得分:2)

根据示例,规则可以简化为以下内容:

  1. 查找#或字符串中的单位
  2. 如果未找到,则不执行任何操作
  3. 如果找到,则找到之后的第一个单词并提取

这可以通过组合字符串操作来实现

SELECT IIF(CHARINDEX(address, ' ') > 0,  SUBSTRING(a.TrimmedAddress, 1, CHARINDEX(a.TrimmedAddress, ' ')), a.TrimmedAddress) AS 'UnitNumber'
FROM (
    SELECT IIF(CHARINDEX(address, '#') > 0,  
           LTRIM(RIGHT(address, CHARINDEX(address, '#') + 1)),
           IIF(CHARINDEX(address, 'unit') > 0, LTRIM(RIGHT(address,CHARINDEX(address, '#') + 1)),'') AS 'TrimmedAddress'
    ) FROM address) a

答案 1 :(得分:1)

我认为这应该做您想要的

--Check if Address contains # or UNIT if so, Left Trim the whitespace
--Off of the substring value to remove the white space ones
--And get the remaining string value    

SELECT CASE WHEN CHARINDEX('#', [Address]) > 0 THEN LTRIM(SUBSTRING([Address], CHARINDEX('#', [Address])
+ 1, LEN([Address]) - (CHARINDEX('#', [Address])))) WHEN CHARINDEX('UNIT', [Address])
 > 0 THEN LTRIM(SUBSTRING([Address], CHARINDEX('UNIT', [Address]) + 4, LEN([Address]) - 
(CHARINDEX('UNIT', [Address])))) ELSE '' END AS [UnitValue]

答案 2 :(得分:1)

这应该起作用,假设您的数据没有主要差异:

;WITH CTE (Column1) AS (
    SELECT * FROM (
        VALUES
            ('3015 NESSLING ST'), 
            ('6941 CHESTER DR # H'), 
            ('6941 CHESTER DR #B'), 
            ('7203 MID TOWN RD # 209'), 
            ('3100 LAKE MENDOTA DR #802'), 
            ('6949 CHESTER DR UNIT C'), 
            ('6949 CHESTER DR UNITC'), 
            ('7203 MID TOWN RD UNIT 207'), 
            ('7203 MID TOWN RD UNIT207')
    ) AS A (Column1)
)

SELECT CASE 
    WHEN PATINDEX('%#%', Column1) > 0
        THEN LTRIM(SUBSTRING(Column1, CHARINDEX('#', Column1) + 1, LEN(Column1) - (CHARINDEX('#', Column1))))
    WHEN PATINDEX('%UNIT%', Column1) > 0
        THEN LTRIM(SUBSTRING(Column1, CHARINDEX('UNIT', Column1) + 4, LEN(Column1) - (CHARINDEX('UNIT', Column1))))
    ELSE 
        Column1
    END AS Result
FROM CTE