我正在解析xml,当xml格式错误时,它会跳过使用xml文档并进行常规文本解析。我替换了所有非法的xml字符,但我仍然会随机出现此错误;
'<',十六进制值0x3C,是无效的属性字符
如果你把xml文本放在一个文本文件中并使用字符串解析或正则表达式的东西定期解析它,为什么xml错误仍然会出现。我可以在替换非法字符后看到内容,但仍然无法加载到xml文档中,并且在常规解析中仍会出错。
清理方法:
Private Shared Function replaceIllegalXMLChars(ByVal item As String) As String
Dim returnValue As String = item
returnValue = returnValue.Replace("&", "&")
returnValue = returnValue.Replace("""", """)
returnValue = returnValue.Replace("'", "'")
returnValue = returnValue.Replace("<", "<")
returnValue = returnValue.Replace(">", ">")
Return returnValue
End Function
Private Shared Function replaceIllegalChars(ByVal state As String) As String
'remove any hexdecimal characters like  and �
state = Regex.Replace(state, "&\#x([0-9A-F]{1}[0-9A-F]{0,1});", " ")
'remove any hexdecimal characters like  and � (no x's)
state = Regex.Replace(state, "&\#([0-9A-F]{1}[0-9A-F]{0,1});", " ")
state = Regex.Replace(state, "0xFFFF", " ")
'added on 3/19/2008 to fix a non-encoded character issue, mostly for LJ
state = Regex.Replace(state, " ", "&nbsp;")
'mark as completed
Return state
End Function
调用功能内容
Dim xdoc As XmlDocument = New XmlDocument
_html = replaceIllegalChars(_html)
_html = replaceIllegalXMLChars(_html)
_html = checkUnicode(_html)
' now load the file as a xml file
xdoc.LoadXml(_html)
有时我收到错误根级别的数据无效。第1行,第1位, 有没有明确的方法来清理xml并解析它。
如果我不是格式化xml而只是采用原始文本并解析它,那么为什么xml错误仍然会出现,如果它不仅仅被认为是文本而且如果没有那么那么除了手动之外还有浏览文档并找到标签中的所有错误并进行更正。