只保留所需的字符,并在T-SQL中用分号分隔

时间:2016-07-12 11:15:54

标签: sql-server database string

问题:

我将文本数据导入到数据库中,其中包含许多不需要的字符。我需要在导入的文本字符串中只保留4个大写字母字符串。例如:

1447;#MIBD (This is a nice name);#2056;#LKRE (Very nice name indeed)

这可能在我桌子的一行中的一列中。我需要从字符串中提取的是:

MIBD and LKRE

结果最好是用分号分隔的所需字符串。

它应该应用于整个列,我不知道这4个大写字母字符串中有多少可能出现在一行中。

经历了像PATINDEX等各种各样的功能,但实际上不知道如何处理它。谢谢你的帮助!

2 个答案:

答案 0 :(得分:0)

试试这个,它假设四个字符代码总是在前面加上;#。由于PATINDEX不区分大小写,我添加了额外的检查以验证所有四个字符都是大写。

DECLARE @MyTable Table( ID INT, MyString VARCHAR(8000))

INSERT INTO @MyTable
VALUES
     (1, '1447;#MIBD (This is a nice name);#2056;#LKRE (Very nice name indeed)')
    ,(2, ';#DBCC (This is a nice name);#2056;#LLC (Very nice name indeed) ;#ABCD')
    ,(3, ';#AaaA;#OPQR;1234 (and) ;#WXYZ')
    ,(4, ';#abc this empty string without any code')

;WITH CTE AS 
(
    SELECT ID 
        ,SUBSTRING(MyString, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+2, 4) AS NewString
        ,STUFF(MyString, 1, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+6, '') AS MyString
    FROM @MyTable m
    WHERE PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString) > 0

    UNION ALL 
    SELECT ID 
        ,SUBSTRING(MyString, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+2, 4) AS NewString
        ,STUFF(MyString, 1, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+6, '') AS MyString
    FROM CTE c
    WHERE PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString) > 0     
) 

SELECT c.ID,
    STUFF(( SELECT '; ' + NewString
            FROM CTE c1
            WHERE c1.ID = c.ID
                AND ASCII(SUBSTRING(NewString, 1, 1)) BETWEEN ASCII('A') AND ASCII('Z')  -- first char
                AND ASCII(SUBSTRING(NewString, 2, 1)) BETWEEN ASCII('A') AND ASCII('Z')  -- second char 
                AND ASCII(SUBSTRING(NewString, 3, 1)) BETWEEN ASCII('A') AND ASCII('Z')  -- third char 
                AND ASCII(SUBSTRING(NewString, 4, 1)) BETWEEN ASCII('A') AND ASCII('Z')  -- fourth char 
            FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')      -- use the value clause to hanlde xml character issue like, &,",>,<
        ,1,1,'') AS CodeList
FROM CTE c
GROUP BY ID
OPTION (MAXRECURSION 0);

答案 1 :(得分:-1)

到目前为止,我发现了这样的事情:

ALTER FUNCTION CleanData
(
    -- Parameters here
    @Text AS VARCHAR(4000)
)
RETURNS VARCHAR(4000)

AS
BEGIN

WHILE PATINDEX('%[0-9#;()]%', @Text) > 0
BEGIN
    SET @Text = STUFF(@Text, PATINDEX('%[0-9#;()]%', @Text), 1, '') 
END
RETURN @Text

END

但我得到的是姓名缩写和parantheses中的字符,因为PATINDEX在大写和小写之间无法区分。也许它可能对其他人有帮助