SQL Server / transact SQL中是否有任何方法可以解析(格式错误的)缺少引用属性的引用的XML,例如:
SELECT CAST('<test A=B />' AS XML)
上述内容失败了:
XML解析:第1行,第9个字符,预期字符串文字
解析以下内容成功:
SELECT CAST('<test A="B" />' AS XML)
答案 0 :(得分:1)
您的假设不正确。您无法解析XML,因为它不是...... XML。如果您阅读XML规范2.3 Common Syntactic Constructs,您会看到:
AttValue ::= '"' ([^<&"] | Reference)* '"'
| "'" ([^<&'] | Reference)* "'"
属性必须用“或”引用。
答案 1 :(得分:1)
解决此问题的正确方法是在源头修复XML。但是,如果出于任何原因这是不可能的,那么你可以通过字符串拆分器函数和基本的字符串操作来修复xml。这个方法确实假设一个相对简单的xml,并且可能不适用于大型或复杂的xml字符串。
首先你必须创建一个字符串拆分器功能,这里有很多例子,但我在下面已经包含了一个完整性示例:
CREATE FUNCTION [dbo].[SplitString]
(
@string NVARCHAR(MAX),
@delimiter CHAR(1)
)
RETURNS @output TABLE(splitdata NVARCHAR(MAX)
)
BEGIN
DECLARE @start INT, @end INT
SELECT @start = 1, @end = CHARINDEX(@delimiter, @string)
WHILE @start < LEN(@string) + 1 BEGIN
IF @end = 0
SET @end = LEN(@string) + 1
INSERT INTO @output (splitdata)
VALUES(SUBSTRING(@string, @start, @end - @start))
SET @start = @end + 1
SET @end = CHARINDEX(@delimiter, @string, @start)
END
RETURN
END
接下来,每次出现赋值运算符'='时,将xml字符串拆分为多行。然后使用模式匹配函数查找分配给属性的值的出现,并使用带引号的值替换值。查询的最后一步将split xml字符串连接回单个xml。
DECLARE @malformedXmlString NVARCHAR(MAX) = '<test A=B width = 1000 height= 800 priority =high name="fred" />' --'<test A=BCD>DATA<\test>'
DECLARE @xmlSplit TABLE
(
ID INT IDENTITY
,splitdata NVARCHAR(MAX)
)
INSERT INTO @xmlSplit
(
splitdata
)
SELECT LTRIM(RTRIM(splitdata)) AS splitdata
FROM [dbo].[SplitString](@malformedXmlString, '=')
UPDATE @xmlSplit
SET splitdata = UpdatedXml.splitdata
FROM @xmlSplit OrginalXml
INNER JOIN (
SELECT ID
-- Use the PATINDEX function to determine the position in the string where the attribute values end. Replace value with quoted version of value.
,REPLACE(splitdata
,LTRIM(RTRIM(LEFT(splitdata, PATINDEX('%[ />]%', splitdata) -1)))
,'"' + LTRIM(RTRIM(LEFT(splitdata, PATINDEX('%[ />]%', splitdata) -1))) + '"') AS splitdata
FROM @xmlSplit
WHERE splitdata LIKE '[a-zA-Z0-9]%[ />]%' -- Only return occurrences of string which start with an alpha numeric character and ends with a space, ‘/’ or ‘>’ character. This should be your value of the attribute we split the string on.
) UpdatedXml ON OrginalXml.ID = UpdatedXml.ID
DECLARE @xmlString NVARCHAR(MAX);
SELECT @xmlString = COALESCE(@xmlString + '=', '') + CONVERT(NVARCHAR(MAX), splitdata)
FROM @xmlSplit
SELECT CAST(@xmlString AS XML)