在没有webbrowser的情况下获取网站的内部文本

时间:2017-01-28 14:54:24

标签: html vb.net get httpwebrequest

我希望通过代码获取网站的内部文字。

我已经可以使用下面的代码获取它的内部HTML,但我找不到任何能够获取URL的内部文本而没有webbrowser 的代码。

此代码在webbrowser中从网站获取文本,但我需要相同的东西,只是没有webbrowser。

Dim sourceString As String = WebBrowser1.Document.Body.InnerText

2 个答案:

答案 0 :(得分:2)

使用HtmlAgilityPack ...

Private Sub ToolStripButton1_Click(sender As Object, e As EventArgs) Handles ToolStripButton1.Click
    Dim doc As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument
    With New Net.WebClient
        doc.LoadHtml(.DownloadString("https://example.com"))
        .Dispose()
    End With

    Debug.Print(doc.DocumentNode.Name)
    PrintChildNodes(doc.DocumentNode)

    Debug.Print(doc.DocumentNode.Element("html").Element("body").InnerText)
End Sub

Sub PrintChildNodes(Node As HtmlAgilityPack.HtmlNode, Optional Indent As Integer = 1)
    For Each Child As HtmlAgilityPack.HtmlNode In Node.ChildNodes
        Debug.Print("{0}{1}", String.Empty.PadLeft(Indent, vbTab), Child.Name)
        PrintChildNodes(Child, Indent + 1)
    Next
End Sub

答案 1 :(得分:0)

**取自** Wolfwyrd

在此问题中 HTTP GET in VB.NET

 Try
Dim fr As System.Net.HttpWebRequest
Dim targetURI As New Uri("http://whatever.you.want.to.get/file.html")         

fr = DirectCast(HttpWebRequest.Create(targetURI), System.Net.HttpWebRequest)
If (fr.GetResponse().ContentLength > 0) Then
    Dim str As New System.IO.StreamReader(fr.GetResponse().GetResponseStream())
    Response.Write(str.ReadToEnd())
    str.Close(); 
End If   

Catch ex As System.Net.WebException    '访问资源时出错,处理它 结束尝试

您将获得Html以及http标头。不要认为这对https本身有用。