解析字符串对于Charindex而言太长

时间:2018-08-06 13:37:52

标签: sql sql-server tsql sql-server-2016 charindex

在TSQL(SSMS 2016)中,我试图使用WHILE循环,临时表和CHARINDEX解析长字符串中的重复数据。每次循环运行时,它都会使用上一个停止点作为下一个起点。循环有效,但是CHARINDEX似乎有8000个字符的限制,并且字符串比这更长。还有另一种方法可以解析超过8000个字符的字符串中的数据?

编辑-我正试图从长字符串(超过100,000个字符)中提取由属性标签表示的名称。数据看起来像这样,但是被连接成一个长字符串:

<alarm-response-list xmlns="http://www.thePlace.com" total-alarms="862" throttle="862" error="EndOfResults">
    <alarm-responses>
        <alarm id="5afeeaac-355f-11a0-02bd-0080101c40b8">
            <attribute id="0x12d7f">##.##.###.###</attribute>
            <attribute id="0x1006e">Narnia</attribute>
        </alarm>
        <alarm id="5b5724cb-e0be-1016-0275-0080101c40b8">
            <attribute id="0x12d7f">##.##.###.###</attribute>
            <attribute id="0x1006e">Mordor</attribute>
        </alarm>
        <alarm id="5b4af6e5-8f8d-103e-023d-0080101c40b8">
            <attribute id="0x12d7f">##.##.###.###</attribute>
            <attribute id="0x1006e">Atlantis</attribute>
        </alarm>

在此示例中,我需要属性ID为“ 0x1006e”的任何内容。

编辑-请参见下面的示例代码。只要WHILE语句的数量小于8000,代码就可以正常运行。此后,将启动8000个字符的CHARINDEX。

    DECLARE @temp TABLE(modelName VARCHAR(300))
    DECLARE @ctr INT = (SELECT MIN(ID) FROM [dbo].[Alarms])
    DECLARE @start INT = (SELECT CHARINDEX('1006E',Results)+7 FRP FROM [dbo]. 
   [Alarms] WHERE ID = @ctr) 
    DECLARE @len INT = (SELECT  
    CHARINDEX('</attr',Results,CHARINDEX('1006E',Results)) - 
    CHARINDEX('1006E',Results) - 7 FROM [dbo].[Alarms] WHERE ID = @ctr)
    DECLARE @totalLen INT = (SELECT LEN(CAST(results AS VARCHAR(MAX))) FROM 
    dbo.Alarms WHERE ID = @ctr)

    WHILE @start < 5000 BEGIN
    INSERT @temp 
    SELECT SUBSTRING(Results,@start,@len) Name 

     FROM [dbo].[Alarms]
     WHERE ID = @ctr

    SET @start = (SELECT CHARINDEX('1006E',Results,@start + 1)+7 FRP FROM [dbo]. 
    [Alarms] WHERE ID = @ctr) 
    SET @len = (SELECT  
    CHARINDEX('</attr',Results,CHARINDEX('1006E',Results,@start+1)) -
          CHARINDEX('1006E',Results,@start + 1) - 7 FROM [dbo].[Alarms] WHERE ID 
    = @ctr)

    END

    Select * from @temp

5 个答案:

答案 0 :(得分:0)

@James Pratt,我做了一个模拟,为您提供了使用循环的另一种逻辑。

这是使用CTE建立2000行(并且有扩展技术的行)。

DECLARE @Alarms table (Id int, Results varchar(max))
INSERT into @Alarms 
SELECT 1, '[A]dkdk[/A][B]123[/B][N]Fred[/N][A]ddj[/A][B]456[/B][N]Bill[/N][A]akdl[/A]...'


;WITH x AS 
(
  SELECT  TOP (2000) n = ROW_NUMBER() OVER (ORDER BY Number)
  FROM master.dbo.spt_values ORDER BY Number
)
SELECT 
    RIGHT(t.Results, x.n) 
FROM x
    JOIN @Alarms AS t ON 
        x.n < LEN(t.Results) 

答案 1 :(得分:0)

如果对TVF开放。

我厌倦了提取字符串(左,右,charindex,patindex等),我修改了一个parse / split函数来接受两个非相似的定界符。

示例

Declare @S varchar(max) = '[A]dkdk[/A][B]123[/B][N]Fred[/N][A]ddj[/A][B]456[/B][N]Bill[/N][A]akdl[/A]'

Select Seq   = A.RetSeq
      ,Item  = B.RetVal
      ,Value = A.RetVal
 From  [dbo].[tvf-str-extract](@S,']','[') A
 Join ( Select Seq=Row_Number() over (Order by RetSeq),* 
         From [dbo].[tvf-str-extract](@S,'[',']') 
         Where charindex('/',RetVal)=0
      ) B on B.Seq=A.RetSeq
 Order By A.RetSeq

返回

Seq Item    Value
1   A       dkdk
2   B       123
3   N       Fred
4   A       ddj
5   B       456
6   N       Bill
7   A       akdl

感兴趣的功能

CREATE FUNCTION [dbo].[tvf-Str-Extract] (@String varchar(max),@Delimiter1 varchar(100),@Delimiter2 varchar(100))
Returns Table 
As
Return (  

with   cte1(N)   As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
       cte2(N)   As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 N1,cte1 N2,cte1 N3,cte1 N4,cte1 N5,cte1 N6) A ),
       cte3(N)   As (Select 1 Union All Select t.N+DataLength(@Delimiter1) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter1)) = @Delimiter1),
       cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter1,@String,s.N),0)-S.N,8000) From cte3 S)

Select RetSeq = Row_Number() over (Order By N)
      ,RetPos = N
      ,RetVal = left(RetVal,charindex(@Delimiter2,RetVal)-1) 
 From  (
        Select *,RetVal = Substring(@String, N, L) 
         From  cte4
       ) A
 Where charindex(@Delimiter2,RetVal)>1

)
/*
Max Length of String 1MM characters

只是为了帮助视觉效果,别名A或独立使用

Select * From  [dbo].[tvf-str-extract](@S,']','[') A

返回

RetSeq  RetPos  RetVal
1       4       dkdk
2       15      123
3       25      Fred
4       36      ddj
5       46      456
6       56      Bill
7       67      akdl

答案 2 :(得分:0)

这看起来像是一种奇怪的XML ...可能并非对所有字符串都有效,但是给定的示例可以轻松地转换为XML:

public class SomeExtension implements BeforeEachCallback {

    @Override
    public void beforeEach(ExtensionContext context) {
        // [...]
    }

}

@ExtendWith(SpringExtension.class)
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@ExtendWith(SomeExtension.class)
class SomeTest {

    @Autowired
    SomeBean bean;

    @Test
    void nothingTest() {
    }

}

结果

DECLARE @tbl table (Id INT IDENTITY, YourString VARCHAR(MAX))
INSERT INTO @tbl VALUES
('[A]dkdk[/A][B]123[/B][N]Fred[/N][A]ddj[/A][B]456[/B][N]Bill[/N][A]akdl[/A]...');

SELECT CAST(REPLACE(REPLACE(t.YourString,'[','<'),']','>') AS XML)
FROM @tbl t

这部分很容易阅读。

但是-坦白说-在很多情况下,这种方法都会失效。

答案 3 :(得分:0)

只要字符串为VARCHAR(MAX),我认为它仍然可以正常工作。无论如何,这是我使用TVF拍摄的照片:

CREATE FUNCTION Tools.FindElements (@String VARCHAR(MAX)
                                  , @Open   VARCHAR(8000)
                                  , @Close  VARCHAR(8000))
/*
This function splits a VARCHAR string into a table of elements/items by finding opening and closing tags
The table returns ID (by the order of occurance) and String (the element)

This is based off Jeff Moden's DelimitedSplit8K function: http://www.sqlservercentral.com/articles/Tally+Table/72993/
It has been modified to:
      1) Accept delimiters longer than 1 character
      2) Use two delimiters (opening and closing tags)
*/   
RETURNS TABLE
WITH SCHEMABINDING
AS
  RETURN
  /* 10 rows (all 1s) */
  WITH CTE_10
       AS (SELECT Number
           FROM(VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1) ) v(Number)),
       -------------------
       /* 100 rows (all 1s) */
       CTE_100
       AS (SELECT Number = 1
           FROM CTE_10 a
                CROSS JOIN CTE_10 b),
       -------------------
       /* 10000 rows max (all 1s) */
       CTE_10000
       AS (SELECT Number = 1
           FROM CTE_100 a
                CROSS JOIN CTE_100 b),
       -------------------
       /* 100000000 rows max (all 1s) - this limits the number of elements to 100 million (which I hope is enough)) */
       CTE_100000000
       AS (SELECT Number = 1
           FROM CTE_10000 a
                CROSS JOIN CTE_10000 b),
       -------------------
       /* Numbers "Table" CTE: 
          1) TOP has variable parameter = DATALENGTH(@String)
          2) Use ROW_NUMBER */
       CTE_Numbers
       AS (SELECT TOP (ISNULL(DATALENGTH(@String), 0)) Number = ROW_NUMBER() OVER(ORDER BY (SELECT NULL) )
           FROM CTE_100000000),
       -------------------
       /* Returns start of the element after each delimiter */
       CTE_Start
       AS (SELECT [Start] = Number + DATALENGTH(@Open)
           FROM CTE_Numbers
           WHERE SUBSTRING(@String, Number, DATALENGTH(@Open)) = @Open),
       -------------------
       /* IF @Delimiter <> '': Returns start and length (for use in substring) */
       CTE_Length
       AS (SELECT [Start]
                , [Length] = ISNULL(NULLIF(CHARINDEX(@Close, @String, [Start]), 0) - [Start], 8000) -- ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
           FROM CTE_Start)


       /* Do the actual split */
       SELECT ID = ROW_NUMBER() OVER(ORDER BY [Start])
            , String = SUBSTRING(@String, [Start], [Length])
       FROM CTE_Length;

然后像这样使用它:

SELECT *
FROM Tools.FindElements ('[A]dkdk[/A][B]123[/B][N]Fred[/N][A]ddj[/A][B]456[/B][N]Bill[/N][A]akdl[/A]', '[N]', '[/N]');

或者这样:

SELECT *
FROM dbo.TextTable
     CROSS APPLY Tools.FindElements (TextColumn, '[N]', '[/N]');

答案 4 :(得分:0)

我将此添加为新答案。

如评论所述:
请避免使用变色龙问题...在您进行编辑之后,这完全是另外一回事,并且使现有答案无效...对于将来:如果您发现需要更改问题,最好通过接受最佳答案来结束现有问题(一个回答最初问题的人。然后开始一个新问题。

使用本机XML的方法解决您的问题

James,您的 string 并不是什么荒谬的事情,您不必解析任何东西,而只是XML。有现成的工具可以阅读。几乎所有的编程语言都将支持XPathXQuery。这没什么可以自己做的...

尝试一下,然后再遇到任何问题(但有一个新问题)

DECLARE @xml XML=
N'<alarm-response-list xmlns="http://www.thePlace.com" total-alarms="862" throttle="862" error="EndOfResults">
    <alarm-responses>
        <alarm id="5afeeaac-355f-11a0-02bd-0080101c40b8">
            <attribute id="0x12d7f">##.##.###.###</attribute>
            <attribute id="0x1006e">Narnia</attribute>
        </alarm>
        <alarm id="5b5724cb-e0be-1016-0275-0080101c40b8">
            <attribute id="0x12d7f">##.##.###.###</attribute>
            <attribute id="0x1006e">Mordor</attribute>
        </alarm>
        <alarm id="5b4af6e5-8f8d-103e-023d-0080101c40b8">
            <attribute id="0x12d7f">##.##.###.###</attribute>
            <attribute id="0x1006e">Atlantis</attribute>
        </alarm>

<!-- have to append closing nodes -->

    </alarm-responses>
</alarm-response-list>';

WITH XMLNAMESPACES(DEFAULT 'http://www.thePlace.com')
SELECT @xml.value('(/alarm-response-list/@total-alarms)[1]','int') AS TotalAlarms
      ,@xml.value('(/alarm-response-list/@throttle)[1]','int') AS throttle
      ,@xml.value('(/alarm-response-list/@error)[1]','nvarchar(max)') AS error
      ,alarm.value('@id','uniqueidentifier') AS Alarm_id
      ,attr.value('@id','nvarchar(max)') AS Alarm_Attribute_id
      ,attr.value('text()[1]','nvarchar(max)') AS Alarm_Attribute_content
FROM @xml.nodes('/alarm-response-list/alarm-responses/alarm') A(alarm)
OUTER APPLY alarm.nodes('attribute') B(attr);

结果

+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| TotalAlarms | throttle | error        | Alarm_id                             | Alarm_Attribute_id | Alarm_Attribute_content |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862         | 862      | EndOfResults | 5AFEEAAC-355F-11A0-02BD-0080101C40B8 | 0x12d7f            | ##.##.###.###           |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862         | 862      | EndOfResults | 5AFEEAAC-355F-11A0-02BD-0080101C40B8 | 0x1006e            | Narnia                  |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862         | 862      | EndOfResults | 5B5724CB-E0BE-1016-0275-0080101C40B8 | 0x12d7f            | ##.##.###.###           |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862         | 862      | EndOfResults | 5B5724CB-E0BE-1016-0275-0080101C40B8 | 0x1006e            | Mordor                  |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862         | 862      | EndOfResults | 5B4AF6E5-8F8D-103E-023D-0080101C40B8 | 0x12d7f            | ##.##.###.###           |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862         | 862      | EndOfResults | 5B4AF6E5-8F8D-103E-023D-0080101C40B8 | 0x1006e            | Atlantis                |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+

使用谓词获得所需的答案

这将使用.nodes()中的谓词来检索所有<attribute>元素的派生表,其中@id具有给定值。

WITH XMLNAMESPACES(DEFAULT 'http://www.thePlace.com')
SELECT a.value('text()[1]','nvarchar(max)') AS Alarm_Attribute_content
FROM @xml.nodes('//attribute[@id="0x1006e"]') A(a)

结果

Alarm_Attribute_content
------
Narnia
Mordor
Atlantis