如何在网页上获得第一个div?

时间:2015-01-12 17:49:44

标签: html vb.net browser html-agility-pack

有谁知道如何获得第一个div的孩子的孩子的链接?

这是页面的外观:

    <div id="id1" class="class-1 class-2 class-3 class-4 class-5 class-6 class-7">
                        <div class="class 8 class 9">
                            <h3><a href="http://foo.com/1">foo.com/1</a></h3>
                        </div>
                    </div>

  <div id="id2" class="class-1 class-2 class-3 class-4 class-5 class-6 class-7">
                        <div class="class 8 class 9">
                            <h3><a href="http://foo.com/2">foo.com/2</a></h3>
                        </div>
                    </div>

  <div id="id3" class="class-1 class-2 class-3 class-4 class-5 class-6 class-7">
                        <div class="class 8 class 9">
                            <h3><a href="http://foo.com/3">foo.com/3</a></h3>
                        </div>
                    </div>

我想得到第一个div,但是每次导航时id都会改变。
因此,我需要一个代码来获取页面上的第一个div,然后获取孩子的链接。然后,WebBrowser可以转到该链接。

这就是我的尝试:

WebBrowser1.Navigate("http://foo.com/home")
WebBrowser1.

1 个答案:

答案 0 :(得分:0)

以下代码检索第一个<h3><a href="http://foo.com/1">foo.com/1</a></h3>内的链接:

    Dim wc As New WebClient
    Dim html As String = wc.DownloadString([.URL link.])
    Dim txt As String = html.ToString()

    Dim re1 As String = ".*?" 'Non-greedy match on filler
    Dim re2 As String = "(http)"  'Word 1
    Dim re3 As String = "(:)" 'Any Single Character 1
    Dim re4 As String = "(\/)"    'Any Single Character 2
    Dim re5 As String = "((?:\/[\w\.\-]+)+)"  'Unix Path 1

    Dim r As Regex = New Regex(re1 + re2 + re3 + re4 + re5, RegexOptions.IgnoreCase Or RegexOptions.Singleline)
    Dim m As Match = r.Match(txt)
    If (m.Success) Then
        Dim s1 = m.Groups(1)
        Dim s2 = m.Groups(2)
        Dim s3 = m.Groups(3)
        Dim s4 = m.Groups(4)
        Dim url As String = m.Groups(1).ToString() + m.Groups(2).ToString() + m.Groups(3).ToString() + m.Groups(4).ToString()
        [.do whatever with the URL here.]
    End If

来自txt2re的修改代码,这是一个在线正则表达式工具this is the source.

请注意,您需要访问NetRegularExpressions命名空间,因此您还需要:

Imports System.Text.RegularExpressions
Imports System.Net