有谁知道如何获得第一个div的孩子的孩子的链接?
这是页面的外观:
<div id="id1" class="class-1 class-2 class-3 class-4 class-5 class-6 class-7">
<div class="class 8 class 9">
<h3><a href="http://foo.com/1">foo.com/1</a></h3>
</div>
</div>
<div id="id2" class="class-1 class-2 class-3 class-4 class-5 class-6 class-7">
<div class="class 8 class 9">
<h3><a href="http://foo.com/2">foo.com/2</a></h3>
</div>
</div>
<div id="id3" class="class-1 class-2 class-3 class-4 class-5 class-6 class-7">
<div class="class 8 class 9">
<h3><a href="http://foo.com/3">foo.com/3</a></h3>
</div>
</div>
我想得到第一个div,但是每次导航时id都会改变。
因此,我需要一个代码来获取页面上的第一个div,然后获取孩子的链接。然后,WebBrowser可以转到该链接。
这就是我的尝试:
WebBrowser1.Navigate("http://foo.com/home")
WebBrowser1.
答案 0 :(得分:0)
以下代码检索第一个<h3><a href="http://foo.com/1">foo.com/1</a></h3>
内的链接:
Dim wc As New WebClient
Dim html As String = wc.DownloadString([.URL link.])
Dim txt As String = html.ToString()
Dim re1 As String = ".*?" 'Non-greedy match on filler
Dim re2 As String = "(http)" 'Word 1
Dim re3 As String = "(:)" 'Any Single Character 1
Dim re4 As String = "(\/)" 'Any Single Character 2
Dim re5 As String = "((?:\/[\w\.\-]+)+)" 'Unix Path 1
Dim r As Regex = New Regex(re1 + re2 + re3 + re4 + re5, RegexOptions.IgnoreCase Or RegexOptions.Singleline)
Dim m As Match = r.Match(txt)
If (m.Success) Then
Dim s1 = m.Groups(1)
Dim s2 = m.Groups(2)
Dim s3 = m.Groups(3)
Dim s4 = m.Groups(4)
Dim url As String = m.Groups(1).ToString() + m.Groups(2).ToString() + m.Groups(3).ToString() + m.Groups(4).ToString()
[.do whatever with the URL here.]
End If
来自txt2re的修改代码,这是一个在线正则表达式工具this is the source.
请注意,您需要访问Net
和RegularExpressions
命名空间,因此您还需要:
Imports System.Text.RegularExpressions
Imports System.Net