Question

我正在解析xml，当xml格式错误时，它会跳过使用xml文档并进行常规文本解析。我替换了所有非法的xml字符，但我仍然会随机出现此错误;

'＆lt;'，十六进制值0x3C，是无效的属性字符

如果你把xml文本放在一个文本文件中并使用字符串解析或正则表达式的东西定期解析它，为什么xml错误仍然会出现。我可以在替换非法字符后看到内容，但仍然无法加载到xml文档中，并且在常规解析中仍会出错。

清理方法：

Private Shared Function replaceIllegalXMLChars(ByVal item As String) As String

    Dim returnValue As String = item
    returnValue = returnValue.Replace("&", "&amp;")
    returnValue = returnValue.Replace("""", "&quot;")
    returnValue = returnValue.Replace("'", "&apos;")
    returnValue = returnValue.Replace("<", "&lt;")
    returnValue = returnValue.Replace(">", "&gt;")

    Return returnValue

End Function

Private Shared Function replaceIllegalChars(ByVal state As String) As String
    'remove any hexdecimal characters like &#x19; and &#x0;
    state = Regex.Replace(state, "&\#x([0-9A-F]{1}[0-9A-F]{0,1});", " ")
    'remove any hexdecimal characters like &#19; and &#0;  (no x's)
    state = Regex.Replace(state, "&\#([0-9A-F]{1}[0-9A-F]{0,1});", " ")
    state = Regex.Replace(state, "0xFFFF", " ")
    'added on 3/19/2008 to fix a non-encoded character issue, mostly for LJ
    state = Regex.Replace(state, "&nbsp;", "&amp;nbsp;")
    'mark as completed
    Return state
End Function

调用功能内容

         Dim xdoc As XmlDocument = New XmlDocument
        _html = replaceIllegalChars(_html)
        _html = replaceIllegalXMLChars(_html)
        _html = checkUnicode(_html)

        ' now load the file as a xml file
        xdoc.LoadXml(_html)

有时我收到错误根级别的数据无效。第1行，第1位，有没有明确的方法来清理xml并解析它。

如果我不是格式化xml而只是采用原始文本并解析它，那么为什么xml错误仍然会出现，如果它不仅仅被认为是文本而且如果没有那么那么除了手动之外还有浏览文档并找到标签中的所有错误并进行更正。

'＆lt;'，十六进制值0x3C，是替换非法字符后的无效属性字符错误

0 个答案: