SQL解析多个子字符串

时间:2016-02-29 19:19:41

标签: sql-server parsing select find charindex

我有一个非常长且复杂的字符串,带有新的换行符 - 我很难解析。我需要能够为以下每个字段创建一个包含列的选择查询。

理想的是找到new line break - 每行 - 返回:所有内容,冒号应该是列的名称,以及:和{之间的所有内容{1}}应该是该字段中的数据。

所有数据都以字符串形式返回,因此我只是为下面的每一行构建一个select语句。我不确定这是否可能。

第二种选择,硬编码并说出类似new ling break的内容我在哪里找到家庭电话字符串,然后在指定字符串后拉出CHARINDEX ( 'Home Phone:' ,notes, 0):之间的所有内容。

在这种情况下,我的查询中的每个选择项都会说 - 查找字符串“Home Phone”并拉出冒号后面的内容,或找到字符串“School Name”等。

这就是数据的样子(在一个名为new ling break的字符串中):

notes

所以输出看起来像这样(在每个字段中也回答了所有长问题)。

Home Phone: 1234567890  
Cell Phone: 1234567890  
Date of Birth: 01/01/1971 
School Name: James Jones High  School 
Address:123 Main Street 
School City: Queens  
School State: PA  
School Zip: 32112 
Years Teaching: 12  
Grade Levels: Middle School  
Total Students: 120  
Subject: Music:   
How did they hear:  Other, provide more info: Former partner teacher in the Middle School 
Type: Public/Charter   
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa  aaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaa aaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaa aaaaaaa aaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaa aaaaa aaaaaa aaaaaa aaaaaaaaaaaa aaaaaaaaaaaa aaa aaaa aaaaa aaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaa aaaaaaaaa aaaaaaaaaaaa.   
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbb bbbbbbbbb bbbbbbb bbbbbb bbbbbb bbbbbbb  bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbb 
Question 3: ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccccccccccccccccc cccccccccccccccccc ccccccccccc ccccccccccccc ccccccccccccccccc cccccccc

我不确定这是否合理 - 但任何和所有建议都非常感激。

用于拉取子字符串和新行char的代码 - 但这是硬编码的。我无法弄清楚如何动态地做到这一点。

Home Phone  Cell Phone  Date of Birth:  …   Type:               Question 1 :                Question 2:    Question 3: 
1234567890  1234567890  1/1/1971            Public/Charter      aaaaaaaa aaaaaaaaaaaaa.     bbb bbbbbbbbbb ccccccccccccccccccccccc 

谢谢!

3 个答案:

答案 0 :(得分:1)

很多这个功劳(90%)应该归功于Alex K,他提供了一个关于找到第n个字符的深入答案

SQL Server - find nth occurrence in a string

我接受了这个答案,根据你的问题调整了它,然后应用了一个PIVOT将它分成所需的行/列。如果它们总是具有相同的逻辑(每个问题/答案由换行符分隔),则此方法应该能够为您需要的任意数量的问题集创建所需的输出。

--Creates temporary table for testing, ID column and second set of data
--used to ensure query works for each unique set of questions
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
    DROP TABLE #Results

CREATE TABLE #Results 
    (ID INT IDENTITY(1,1) NOT NULL,
    Notes NVARCHAR(4000) NOT NULL)
INSERT INTO #Results
    (Notes)
VALUES
    ('Home Phone: 1234567890  
    Cell Phone: 1234567890  
    Date of Birth: 01/01/1971 
    School Name: James Jones High  School 
    Address:123 Main Street 
    School City: Queens  
    School State: PA  
    School Zip: 32112 
    Years Teaching: 12  
    Grade Levels: Middle School  
    Total Students: 120  
    Subject: Music:   
    How did they hear:  Other, provide more info: Former partner teacher in the Middle School 
    Type: Public/Charter '),
    ('Home Phone: test  
    Cell Phone: test 
    Date of Birth: test
    School Name: test
    Address:test 
    School City: test 
    School State: test  
    School Zip: test 
    Years Teaching: test 
    Grade Levels: test 
    Total Students: test
    Subject: test   
    How did they hear:  test 
    Type: test ');

--Recursive CTE to determine the position of each successive line break
--Used CHARINDEX to search CHAR(13) and CHAR(10) and find line breaks and carriage returns
WITH cte
AS

    (SELECT ID, Notes, 1 AS Starts, CHARINDEX(CHAR(13)+CHAR(10),Notes) AS Pos
    FROM #Results
    UNION ALL
    SELECT ID, Notes, Pos +1, CHARINDEX(CHAR(13)+CHAR(10),Notes,Pos+1) AS Pos
    FROM cte
    WHERE
        pos >0),

--2nd CTE breaks each question set into it's own row
cte2
AS
    (SELECT ID, Notes,Starts, Pos,
        SUBSTRING(Notes, Starts,
            CASE
                WHEN pos > 0 THEN (pos - starts)
                ELSE LEN(notes)
            END) AS Token
    FROM cte),

--3rd CTE cleans up the data, separating the Questions/Answers into separate columns
--REPLACE is used to remove Line Break (CHAR(10)), output was then showing a TAB so used
--double REPLACE and removed CHAR(9) (tab)
--LTRIM removes leading space
cte3
AS
    (SELECT ID, 
        LTRIM(REPLACE(REPLACE(SUBSTRING(Token,CHARINDEX(CHAR(13)+CHAR(10),Token),CHARINDEX(':',Token)),CHAR(10),''),CHAR(9),'')) AS Question, 
        LTRIM(SUBSTRING(Token,CHARINDEX(':',Token)+1,4000)) AS Answer
    FROM cte2)

--Pivot separates each Question/Answer row into it's own column
SELECT *
FROM
    (SELECT ID, Question, Answer
    FROM cte3) AS a
PIVOT
    (MAX(Answer)
    FOR [Question] IN([Address],[Cell Phone],[Date of Birth],[Grade Levels],[Home Phone],[How did they hear],
                        [School City],[School Name],[School State],[School Zip],[Subject],[Total Students],[Type],[Years Teaching])) AS pvt

我对每个部分发表评论,希望能够解释我的逻辑,但如果您有任何问题,请告诉我。

编辑:动态数据透视

可以使用动态SQL创建一个PIVOT,它将自动选取所有“问题”列并进行相应调整。我不相信它可以一步完成,因为我不得不使用多个CTE。我要做的是采取上述步骤来创建CTE,CTE2和CTE3(基本上是PIVOT查询之前的所有内容)并创建这些步骤的视图,然后使用该视图执行以下操作(对于我的示例,视图称为“问卷“)

DECLARE @columns AS NVARCHAR(MAX)
DECLARE @query AS NVARCHAR(MAX)

SET @columns =  STUFF((SELECT DISTINCT ',' + QUOTENAME(q.question)
        FROM questionaire AS q
        FOR XML PATH(''), TYPE
        ).value('.','NVARCHAR(MAX)')
        ,1,1,'')

SET @query =    'SELECT ID, '+ @columns +' FROM
        (
            SELECT ID, Answer, Question
            FROM questionaire
        ) AS a
        PIVOT
        (
            MAX(Answer)
            FOR Question IN(' +@columns+')
        ) AS p'
EXECUTE(@query)

答案 1 :(得分:0)

我知道这里的很多人不喜欢这个分离器,但它是我喜欢的那个。它只能处理高达8000的输入值,分隔符只能是一个字符。然而,它有一些不错的东西,其他一些分离者不会,除非你有大量的输入,它几乎所有东西都充足。你可以在这里找到代码。 http://www.sqlservercentral.com/articles/Tally+Table/72993/评论(需要登录)可以在很多页面上运行,并且对这个分割器进行了非常冗长的讨论。

然后其他人更喜欢使用数据透视这种东西,我更喜欢交叉表(也称为条件聚合),因为我发现语法远不那么钝。

我冒昧地稍微修改了您的样本数据。我改变了手机的价值,所以它与家用手机不一样。我也缩短了对问题的回答,因为他们不需要数百个字符来演示这种技巧。

declare @SomeValue varchar(8000)

set @SomeValue = 'Home Phone: 1234567890  
Cell Phone: 3344556677
Date of Birth: 01/01/1971 
School Name: James Jones High  School 
Address:123 Main Street 
School City: Queens  
School State: PA  
School Zip: 32112 
Years Teaching: 12  
Grade Levels: Middle School  
Total Students: 120  
Subject: Music:   
How did they hear:  Other, provide more info: Former partner teacher in the Middle School 
Type: Public/Charter   
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa.
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb
Question 3: ccccccccccccccccccccccc cccccccc';

select 
    MAX(case when s.ItemNumber = 1 then x.Item end) as HomePhone
    , MAX(case when s.ItemNumber = 2 then x.Item end) as DOB
    , MAX(case when s.ItemNumber = 3 then x.Item end) as DOB
    , MAX(case when s.ItemNumber = 4 then x.Item end) as SchoolName
    , MAX(case when s.ItemNumber = 5 then x.Item end) as SchoolAddress
    , MAX(case when s.ItemNumber = 6 then x.Item end) as SchoolCity
    , MAX(case when s.ItemNumber = 7 then x.Item end) as SchoolState
    , MAX(case when s.ItemNumber = 8 then x.Item end) as SchoolZip
    , MAX(case when s.ItemNumber = 9 then x.Item end) as YearsTeaching
    , MAX(case when s.ItemNumber = 10 then x.Item end) as GradeLevels
    , MAX(case when s.ItemNumber = 11 then x.Item end) as TotalStudents
    , MAX(case when s.ItemNumber = 12 then x.Item end) as Subject
    , MAX(case when s.ItemNumber = 13 then x.Item end) as HowHeard
    , MAX(case when s.ItemNumber = 14 then x.Item end) as SchoolType
    , MAX(case when s.ItemNumber = 15 then x.Item end) as Question1
    , MAX(case when s.ItemNumber = 16 then x.Item end) as Question2
    , MAX(case when s.ItemNumber = 17 then x.Item end) as Question3
from dbo.DelimitedSplit8K(@SomeValue, CHAR(10)) s
cross apply dbo.DelimitedSplit8K(s.Item, ':') x

答案 2 :(得分:0)

您可以尝试xml这样,但我在:music之后删除了额外的provide more info

DECLARE @string nvarchar(max) = '
Home Phone: 1234567890  
Cell Phone: 1234567890  
Date of Birth: 01/01/1971 
School Name: James Jones High  School 
Address:123 Main Street 
School City: Queens  
School State: PA  
School Zip: 32112 
Years Teaching: 12  
Grade Levels: Middle School  
Total Students: 120  
Subject: Music   
How did they hear:  Other, provide more info, Former partner teacher in the Middle School 
Type: Public/Charter   
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa  aaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaa aaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaa aaaaaaa aaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaa aaaaa aaaaaa aaaaaa aaaaaaaaaaaa aaaaaaaaaaaa aaa aaaa aaaaa aaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaa aaaaaaaaa aaaaaaaaaaaa.   
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbb bbbbbbbbb bbbbbbb bbbbbb bbbbbb bbbbbbb  bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbb 
Question 3: ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccccccccccccccccc cccccccccccccccccc ccccccccccc ccccccccccccc ccccccccccccccccc cccccccc'
,@xml as xml

SELECT @xml = REPLACE ('<mystring><fieldname id="'+REPLACE(REPLACE(right(@string,LEN(@string)-2),':','" >'),CHAR(10),'</fieldname><fieldname id="')+'</fieldname></mystring>' ,CHAR(13),'')

SELECT
    n.v.value('(fieldname[@id="Home Phone"])[1]','NVARCHAR(11)') AS 'Home Phone',
    n.v.value('(fieldname[@id="Cell Phone"])[1]','NVARCHAR(11)') AS 'Cell Phone',
    n.v.value('(fieldname[@id="Date of Birth"])[1]','NVARCHAR(12)') AS 'Date of Birth',
    n.v.value('(fieldname[@id="School Name"])[1]','NVARCHAR(30)') AS 'School Name',
    n.v.value('(fieldname[@id="Address"])[1]','NVARCHAR(30)') AS 'Address',
    n.v.value('(fieldname[@id="School City"])[1]','NVARCHAR(15)') AS 'School City',
    n.v.value('(fieldname[@id="School State"])[1]','NVARCHAR(10)') AS 'School State',
    n.v.value('(fieldname[@id="School Zip"])[1]','NVARCHAR(6)') AS 'School Zip',
    n.v.value('(fieldname[@id="Years Teaching"])[1]','NVARCHAR(5)') AS 'Years Teaching',
    n.v.value('(fieldname[@id="Grade Levels"])[1]','NVARCHAR(15)') AS 'Grade Levels',
    n.v.value('(fieldname[@id="Total Students"])[1]','NVARCHAR(5)') AS 'Total Students',
    n.v.value('(fieldname[@id="How did they hear"])[1]','NVARCHAR(100)') AS 'How did they hear',
    n.v.value('(fieldname[@id="Type"])[1]','NVARCHAR(25)') AS 'Type',
    n.v.value('(fieldname[@id="Question 1"])[1]','NVARCHAR(128)') AS 'Question 1',
    n.v.value('(fieldname[@id="Question 2"])[1]','NVARCHAR(128)') AS 'Question 2',
    n.v.value('(fieldname[@id="Question 3"])[1]','NVARCHAR(128)') AS 'Question 3'
FROM @xml.nodes('mystring') as n(v);

结果:

    Home Phone  Cell Phone  Date of Birth School Name                    Address                        School City     School State School Zip Years Teaching Grade Levels    Total Students How did they hear                                                                                    Type                      Question 1                                                                                                                       Question 2                                                                                                                       Question 3
----------- ----------- ------------- ------------------------------ ------------------------------ --------------- ------------ ---------- -------------- --------------- -------------- ---------------------------------------------------------------------------------------------------- ------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------
 1234567890  1234567890  01/01/1971    James Jones High  School      123 Main Street                 Queens          PA           32112      12             Middle School   120             Other, provide more info, Former partner teacher in the Middle School                               Public/Charter            aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa  aaaaaaaaaaaaaaaaa  bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb   ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccc

(1 row(s) affected)