我已经采用并修改了this代码,了解如何检索XML文档的XPath表达式。
我想做同样的事情,但是使用html页面检索其可用的XPath(可能是HtmlDocument
?),这可能吗?
注意:我可以接受原生解决方案,也可以使用HtmlAgilityPack库。
这是XML方法:
''' <summary>
''' Gets all the XPath expressions of an XML Document.
''' </summary>
''' <param name="Document">Indicates the XML document.</param>
''' <returns>List(Of System.String).</returns>
Public Function GetXPaths(ByVal Document As Xml.XmlDocument) As List(Of String)
Dim XPathList As New List(Of String)
Dim XPath As String = String.Empty
For Each Child As Xml.XmlNode In Document.ChildNodes
If Child.NodeType = Xml.XmlNodeType.Element Then
GetXPaths(Child, XPathList, XPath)
End If
Next ' child
Return XPathList
End Function
''' <summary>
''' Gets all the XPath expressions of an XML Node.
''' </summary>
''' <param name="Node">Indicates the XML node.</param>
''' <param name="XPathList">Indicates a ByReffered XPath list as a <see cref="List(Of String)"/>.</param>
''' <param name="XPath">Indicates the current XPath.</param>
Private Sub GetXPaths(ByVal Node As Xml.XmlNode,
ByRef XPathList As List(Of String),
Optional ByVal XPath As String = Nothing)
XPath &= "/" & Node.Name
If Not XPathList.Contains(XPath) Then
XPathList.Add(XPath)
End If
For Each Child As Xml.XmlNode In Node.ChildNodes
If Child.NodeType = Xml.XmlNodeType.Element Then
GetXPaths(Child, XPathList, XPath)
End If
Next ' child
End Sub
答案 0 :(得分:1)
据我所知,HtmlAgilityPack与XmlDocument
的类结构非常相似。因此,我相信您可以轻松调整当前的解决方案以应对HtmlDocument
,如下所示:
Public Function GetXPaths(ByVal Document As HtmlDocument) As List(Of String)
Dim XPathList As New List(Of String)
Dim XPath As String = String.Empty
For Each Child As HtmlNode In Document.DocumentNode.ChildNodes
If Child.NodeType = HtmlNodeType.Element Then
GetXPaths(Child, XPathList, XPath)
End If
Next ' child'
Return XPathList
End Function
Private Sub GetXPaths(ByVal Node As HtmlNode,
ByRef XPathList As List(Of String),
Optional ByVal XPath As String = Nothing)
XPath &= "/" & Node.Name
If Not XPathList.Contains(XPath) Then
XPathList.Add(XPath)
End If
For Each Child As HtmlNode In Node.ChildNodes
If Child.NodeType = HtmlNodeType.Element Then
GetXPaths(Child, XPathList, XPath)
End If
Next ' child'
End Sub
使用符合XML的HTML进行测试时工作正常。但我不能保证这对于格式错误的HTML文档有多大作用。