在TSQL(SSMS 2016)中,我试图使用WHILE循环,临时表和CHARINDEX解析长字符串中的重复数据。每次循环运行时,它都会使用上一个停止点作为下一个起点。循环有效,但是CHARINDEX似乎有8000个字符的限制,并且字符串比这更长。还有另一种方法可以解析超过8000个字符的字符串中的数据?
编辑-我正试图从长字符串(超过100,000个字符)中提取由属性标签表示的名称。数据看起来像这样,但是被连接成一个长字符串:
<alarm-response-list xmlns="http://www.thePlace.com" total-alarms="862" throttle="862" error="EndOfResults">
<alarm-responses>
<alarm id="5afeeaac-355f-11a0-02bd-0080101c40b8">
<attribute id="0x12d7f">##.##.###.###</attribute>
<attribute id="0x1006e">Narnia</attribute>
</alarm>
<alarm id="5b5724cb-e0be-1016-0275-0080101c40b8">
<attribute id="0x12d7f">##.##.###.###</attribute>
<attribute id="0x1006e">Mordor</attribute>
</alarm>
<alarm id="5b4af6e5-8f8d-103e-023d-0080101c40b8">
<attribute id="0x12d7f">##.##.###.###</attribute>
<attribute id="0x1006e">Atlantis</attribute>
</alarm>
在此示例中,我需要属性ID为“ 0x1006e”的任何内容。
编辑-请参见下面的示例代码。只要WHILE语句的数量小于8000,代码就可以正常运行。此后,将启动8000个字符的CHARINDEX。
DECLARE @temp TABLE(modelName VARCHAR(300))
DECLARE @ctr INT = (SELECT MIN(ID) FROM [dbo].[Alarms])
DECLARE @start INT = (SELECT CHARINDEX('1006E',Results)+7 FRP FROM [dbo].
[Alarms] WHERE ID = @ctr)
DECLARE @len INT = (SELECT
CHARINDEX('</attr',Results,CHARINDEX('1006E',Results)) -
CHARINDEX('1006E',Results) - 7 FROM [dbo].[Alarms] WHERE ID = @ctr)
DECLARE @totalLen INT = (SELECT LEN(CAST(results AS VARCHAR(MAX))) FROM
dbo.Alarms WHERE ID = @ctr)
WHILE @start < 5000 BEGIN
INSERT @temp
SELECT SUBSTRING(Results,@start,@len) Name
FROM [dbo].[Alarms]
WHERE ID = @ctr
SET @start = (SELECT CHARINDEX('1006E',Results,@start + 1)+7 FRP FROM [dbo].
[Alarms] WHERE ID = @ctr)
SET @len = (SELECT
CHARINDEX('</attr',Results,CHARINDEX('1006E',Results,@start+1)) -
CHARINDEX('1006E',Results,@start + 1) - 7 FROM [dbo].[Alarms] WHERE ID
= @ctr)
END
Select * from @temp
答案 0 :(得分:0)
@James Pratt,我做了一个模拟,为您提供了使用循环的另一种逻辑。
这是使用CTE建立2000行(并且有扩展技术的行)。
DECLARE @Alarms table (Id int, Results varchar(max))
INSERT into @Alarms
SELECT 1, '[A]dkdk[/A][B]123[/B][N]Fred[/N][A]ddj[/A][B]456[/B][N]Bill[/N][A]akdl[/A]...'
;WITH x AS
(
SELECT TOP (2000) n = ROW_NUMBER() OVER (ORDER BY Number)
FROM master.dbo.spt_values ORDER BY Number
)
SELECT
RIGHT(t.Results, x.n)
FROM x
JOIN @Alarms AS t ON
x.n < LEN(t.Results)
答案 1 :(得分:0)
如果对TVF开放。
我厌倦了提取字符串(左,右,charindex,patindex等),我修改了一个parse / split函数来接受两个非相似的定界符。
示例
Declare @S varchar(max) = '[A]dkdk[/A][B]123[/B][N]Fred[/N][A]ddj[/A][B]456[/B][N]Bill[/N][A]akdl[/A]'
Select Seq = A.RetSeq
,Item = B.RetVal
,Value = A.RetVal
From [dbo].[tvf-str-extract](@S,']','[') A
Join ( Select Seq=Row_Number() over (Order by RetSeq),*
From [dbo].[tvf-str-extract](@S,'[',']')
Where charindex('/',RetVal)=0
) B on B.Seq=A.RetSeq
Order By A.RetSeq
返回
Seq Item Value
1 A dkdk
2 B 123
3 N Fred
4 A ddj
5 B 456
6 N Bill
7 A akdl
感兴趣的功能
CREATE FUNCTION [dbo].[tvf-Str-Extract] (@String varchar(max),@Delimiter1 varchar(100),@Delimiter2 varchar(100))
Returns Table
As
Return (
with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N) As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 N1,cte1 N2,cte1 N3,cte1 N4,cte1 N5,cte1 N6) A ),
cte3(N) As (Select 1 Union All Select t.N+DataLength(@Delimiter1) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter1)) = @Delimiter1),
cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter1,@String,s.N),0)-S.N,8000) From cte3 S)
Select RetSeq = Row_Number() over (Order By N)
,RetPos = N
,RetVal = left(RetVal,charindex(@Delimiter2,RetVal)-1)
From (
Select *,RetVal = Substring(@String, N, L)
From cte4
) A
Where charindex(@Delimiter2,RetVal)>1
)
/*
Max Length of String 1MM characters
只是为了帮助视觉效果,别名A或独立使用
Select * From [dbo].[tvf-str-extract](@S,']','[') A
返回
RetSeq RetPos RetVal
1 4 dkdk
2 15 123
3 25 Fred
4 36 ddj
5 46 456
6 56 Bill
7 67 akdl
答案 2 :(得分:0)
这看起来像是一种奇怪的XML ...可能并非对所有字符串都有效,但是给定的示例可以轻松地转换为XML:
public class SomeExtension implements BeforeEachCallback {
@Override
public void beforeEach(ExtensionContext context) {
// [...]
}
}
@ExtendWith(SpringExtension.class)
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@ExtendWith(SomeExtension.class)
class SomeTest {
@Autowired
SomeBean bean;
@Test
void nothingTest() {
}
}
结果
DECLARE @tbl table (Id INT IDENTITY, YourString VARCHAR(MAX))
INSERT INTO @tbl VALUES
('[A]dkdk[/A][B]123[/B][N]Fred[/N][A]ddj[/A][B]456[/B][N]Bill[/N][A]akdl[/A]...');
SELECT CAST(REPLACE(REPLACE(t.YourString,'[','<'),']','>') AS XML)
FROM @tbl t
这部分很容易阅读。
但是-坦白说-在很多情况下,这种方法都会失效。
答案 3 :(得分:0)
只要字符串为VARCHAR(MAX)
,我认为它仍然可以正常工作。无论如何,这是我使用TVF拍摄的照片:
CREATE FUNCTION Tools.FindElements (@String VARCHAR(MAX)
, @Open VARCHAR(8000)
, @Close VARCHAR(8000))
/*
This function splits a VARCHAR string into a table of elements/items by finding opening and closing tags
The table returns ID (by the order of occurance) and String (the element)
This is based off Jeff Moden's DelimitedSplit8K function: http://www.sqlservercentral.com/articles/Tally+Table/72993/
It has been modified to:
1) Accept delimiters longer than 1 character
2) Use two delimiters (opening and closing tags)
*/
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
/* 10 rows (all 1s) */
WITH CTE_10
AS (SELECT Number
FROM(VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1) ) v(Number)),
-------------------
/* 100 rows (all 1s) */
CTE_100
AS (SELECT Number = 1
FROM CTE_10 a
CROSS JOIN CTE_10 b),
-------------------
/* 10000 rows max (all 1s) */
CTE_10000
AS (SELECT Number = 1
FROM CTE_100 a
CROSS JOIN CTE_100 b),
-------------------
/* 100000000 rows max (all 1s) - this limits the number of elements to 100 million (which I hope is enough)) */
CTE_100000000
AS (SELECT Number = 1
FROM CTE_10000 a
CROSS JOIN CTE_10000 b),
-------------------
/* Numbers "Table" CTE:
1) TOP has variable parameter = DATALENGTH(@String)
2) Use ROW_NUMBER */
CTE_Numbers
AS (SELECT TOP (ISNULL(DATALENGTH(@String), 0)) Number = ROW_NUMBER() OVER(ORDER BY (SELECT NULL) )
FROM CTE_100000000),
-------------------
/* Returns start of the element after each delimiter */
CTE_Start
AS (SELECT [Start] = Number + DATALENGTH(@Open)
FROM CTE_Numbers
WHERE SUBSTRING(@String, Number, DATALENGTH(@Open)) = @Open),
-------------------
/* IF @Delimiter <> '': Returns start and length (for use in substring) */
CTE_Length
AS (SELECT [Start]
, [Length] = ISNULL(NULLIF(CHARINDEX(@Close, @String, [Start]), 0) - [Start], 8000) -- ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
FROM CTE_Start)
/* Do the actual split */
SELECT ID = ROW_NUMBER() OVER(ORDER BY [Start])
, String = SUBSTRING(@String, [Start], [Length])
FROM CTE_Length;
然后像这样使用它:
SELECT *
FROM Tools.FindElements ('[A]dkdk[/A][B]123[/B][N]Fred[/N][A]ddj[/A][B]456[/B][N]Bill[/N][A]akdl[/A]', '[N]', '[/N]');
或者这样:
SELECT *
FROM dbo.TextTable
CROSS APPLY Tools.FindElements (TextColumn, '[N]', '[/N]');
答案 4 :(得分:0)
我将此添加为新答案。
如评论所述:
请避免使用变色龙问题...在您进行编辑之后,这完全是另外一回事,并且使现有答案无效...对于将来:如果您发现需要更改问题,最好通过接受最佳答案来结束现有问题(一个回答最初问题的人。然后开始一个新问题。
James,您的 string 并不是什么荒谬的事情,您不必解析任何东西,而只是XML。有现成的工具可以阅读。几乎所有的编程语言都将支持XPath
和XQuery
。这没什么可以自己做的...
尝试一下,然后再遇到任何问题(但有一个新问题)
DECLARE @xml XML=
N'<alarm-response-list xmlns="http://www.thePlace.com" total-alarms="862" throttle="862" error="EndOfResults">
<alarm-responses>
<alarm id="5afeeaac-355f-11a0-02bd-0080101c40b8">
<attribute id="0x12d7f">##.##.###.###</attribute>
<attribute id="0x1006e">Narnia</attribute>
</alarm>
<alarm id="5b5724cb-e0be-1016-0275-0080101c40b8">
<attribute id="0x12d7f">##.##.###.###</attribute>
<attribute id="0x1006e">Mordor</attribute>
</alarm>
<alarm id="5b4af6e5-8f8d-103e-023d-0080101c40b8">
<attribute id="0x12d7f">##.##.###.###</attribute>
<attribute id="0x1006e">Atlantis</attribute>
</alarm>
<!-- have to append closing nodes -->
</alarm-responses>
</alarm-response-list>';
WITH XMLNAMESPACES(DEFAULT 'http://www.thePlace.com')
SELECT @xml.value('(/alarm-response-list/@total-alarms)[1]','int') AS TotalAlarms
,@xml.value('(/alarm-response-list/@throttle)[1]','int') AS throttle
,@xml.value('(/alarm-response-list/@error)[1]','nvarchar(max)') AS error
,alarm.value('@id','uniqueidentifier') AS Alarm_id
,attr.value('@id','nvarchar(max)') AS Alarm_Attribute_id
,attr.value('text()[1]','nvarchar(max)') AS Alarm_Attribute_content
FROM @xml.nodes('/alarm-response-list/alarm-responses/alarm') A(alarm)
OUTER APPLY alarm.nodes('attribute') B(attr);
结果
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| TotalAlarms | throttle | error | Alarm_id | Alarm_Attribute_id | Alarm_Attribute_content |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862 | 862 | EndOfResults | 5AFEEAAC-355F-11A0-02BD-0080101C40B8 | 0x12d7f | ##.##.###.### |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862 | 862 | EndOfResults | 5AFEEAAC-355F-11A0-02BD-0080101C40B8 | 0x1006e | Narnia |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862 | 862 | EndOfResults | 5B5724CB-E0BE-1016-0275-0080101C40B8 | 0x12d7f | ##.##.###.### |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862 | 862 | EndOfResults | 5B5724CB-E0BE-1016-0275-0080101C40B8 | 0x1006e | Mordor |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862 | 862 | EndOfResults | 5B4AF6E5-8F8D-103E-023D-0080101C40B8 | 0x12d7f | ##.##.###.### |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
| 862 | 862 | EndOfResults | 5B4AF6E5-8F8D-103E-023D-0080101C40B8 | 0x1006e | Atlantis |
+-------------+----------+--------------+--------------------------------------+--------------------+-------------------------+
这将使用.nodes()
中的谓词来检索所有<attribute>
元素的派生表,其中@id
具有给定值。
WITH XMLNAMESPACES(DEFAULT 'http://www.thePlace.com')
SELECT a.value('text()[1]','nvarchar(max)') AS Alarm_Attribute_content
FROM @xml.nodes('//attribute[@id="0x1006e"]') A(a)
结果
Alarm_Attribute_content
------
Narnia
Mordor
Atlantis