缺少属性引号的SQL Server Parse(格式错误的)XML

时间:2016-10-03 11:04:07

标签: sql-server xml tsql sql-server-2012

SQL Server / transact SQL中是否有任何方法可以解析(格式错误的)缺少引用属性的引用的XML,例如:

SELECT CAST('<test A=B />' AS XML)

上述内容失败了:

  

XML解析:第1行,第9个字符,预期字符串文字

解析以下内容成功:

SELECT CAST('<test A="B" />' AS XML)

2 个答案:

答案 0 :(得分:1)

您的假设不正确。您无法解析XML,因为它不是...... XML。如果您阅读XML规范2.3 Common Syntactic Constructs,您会看到:

AttValue       ::=  '"' ([^<&"] | Reference)* '"'
                 |  "'" ([^<&'] | Reference)* "'"

属性必须用“或”引用。

答案 1 :(得分:1)

解决此问题的正确方法是在源头修复XML。但是,如果出于任何原因这是不可能的,那么你可以通过字符串拆分器函数和基本的字符串操作来修复xml。这个方法确实假设一个相对简单的xml,并且可能不适用于大型或复杂的xml字符串。

首先你必须创建一个字符串拆分器功能,这里有很多例子,但我在下面已经包含了一个完整性示例:

CREATE FUNCTION [dbo].[SplitString] 
( 
    @string NVARCHAR(MAX), 
    @delimiter CHAR(1) 
) 
RETURNS @output TABLE(splitdata NVARCHAR(MAX) 
) 
BEGIN 
    DECLARE @start INT, @end INT 
    SELECT @start = 1, @end = CHARINDEX(@delimiter, @string) 
    WHILE @start < LEN(@string) + 1 BEGIN 
        IF @end = 0  
            SET @end = LEN(@string) + 1

        INSERT INTO @output (splitdata)  
        VALUES(SUBSTRING(@string, @start, @end - @start)) 
        SET @start = @end + 1 
        SET @end = CHARINDEX(@delimiter, @string, @start)

    END 
    RETURN 
END

接下来,每次出现赋值运算符'='时,将xml字符串拆分为多行。然后使用模式匹配函数查找分配给属性的值的出现,并使用带引号的值替换值。查询的最后一步将split xml字符串连接回单个xml。

DECLARE @malformedXmlString NVARCHAR(MAX) = '<test A=B width = 1000 height= 800 priority =high name="fred" />' --'<test A=BCD>DATA<\test>'

DECLARE @xmlSplit TABLE
(
     ID INT IDENTITY
    ,splitdata NVARCHAR(MAX)
)

INSERT INTO @xmlSplit
(
    splitdata
)
SELECT LTRIM(RTRIM(splitdata)) AS splitdata
FROM    [dbo].[SplitString](@malformedXmlString, '=')



UPDATE @xmlSplit
SET     splitdata = UpdatedXml.splitdata
FROM    @xmlSplit OrginalXml
INNER JOIN  (
                SELECT  ID
                         -- Use the PATINDEX function to determine the position in the string where the attribute values end. Replace value with quoted version of value. 
                        ,REPLACE(splitdata
                                ,LTRIM(RTRIM(LEFT(splitdata, PATINDEX('%[ />]%', splitdata) -1)))
                                ,'"' + LTRIM(RTRIM(LEFT(splitdata, PATINDEX('%[ />]%', splitdata) -1))) + '"') AS splitdata 
                FROM    @xmlSplit
                WHERE   splitdata LIKE '[a-zA-Z0-9]%[ />]%' -- Only return occurrences of string which start with an alpha numeric character and ends with a space, ‘/’ or ‘>’ character. This should be your value of the attribute we split the string on.
            ) UpdatedXml ON OrginalXml.ID = UpdatedXml.ID


DECLARE @xmlString NVARCHAR(MAX);

SELECT @xmlString = COALESCE(@xmlString + '=', '') + CONVERT(NVARCHAR(MAX), splitdata)
FROM @xmlSplit

SELECT CAST(@xmlString AS XML)