从网站中提取href文本

时间:2014-11-20 14:21:28

标签: vb.net visual-studio-2012

如何从网站中提取href文本?

<div class="ba by"><a href="http://somewebaddress.com">**I want this text!**</a></div>

我尝试了一些解决方案,但不起作用。

Dim myMatches As MatchCollection
    Dim myRegex As New Regex("<div.*?class=""ba by"".*?>.*</div>", RegexOptions.Singleline)
    Dim wc As New WebClient
    Dim html As String = wc.DownloadString("http://somewebaddress.com")
    TextBox1.Text = html
    myMatches = myRegex.Matches(html)
    MsgBox(html)
            Dim successfulMatch As Match
    For Each successfulMatch In myMatches
        MsgBox(successfulMatch.Groups(1).ToString)
    Next

Dim divs = WebBrowser1.Document.Body.GetElementsByTagName("div")
        For Each d As HtmlElement In divs
            If d.GetAttribute("class") = "ba by" Then
                TextBox1.Text = d.InnerText
            End If
        Next

谢谢!

1 个答案:

答案 0 :(得分:0)

而不是......

Dim divs = WebBrowser1.Document.Body.GetElementsByTagName("div")

...试

Dim anchors = WebBrowser1.Document.Body.GetElementsByTagName("a")

那会给你一份所有的“