我希望通过代码获取网站的内部文字。
我已经可以使用下面的代码获取它的内部HTML,但我找不到任何能够获取URL的内部文本而没有webbrowser 的代码。
此代码在webbrowser中从网站获取文本,但我需要相同的东西,只是没有webbrowser。
Dim sourceString As String = WebBrowser1.Document.Body.InnerText
答案 0 :(得分:2)
使用HtmlAgilityPack ...
Private Sub ToolStripButton1_Click(sender As Object, e As EventArgs) Handles ToolStripButton1.Click
Dim doc As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument
With New Net.WebClient
doc.LoadHtml(.DownloadString("https://example.com"))
.Dispose()
End With
Debug.Print(doc.DocumentNode.Name)
PrintChildNodes(doc.DocumentNode)
Debug.Print(doc.DocumentNode.Element("html").Element("body").InnerText)
End Sub
Sub PrintChildNodes(Node As HtmlAgilityPack.HtmlNode, Optional Indent As Integer = 1)
For Each Child As HtmlAgilityPack.HtmlNode In Node.ChildNodes
Debug.Print("{0}{1}", String.Empty.PadLeft(Indent, vbTab), Child.Name)
PrintChildNodes(Child, Indent + 1)
Next
End Sub
答案 1 :(得分:0)
**取自** Wolfwyrd
在此问题中 HTTP GET in VB.NET
Try
Dim fr As System.Net.HttpWebRequest
Dim targetURI As New Uri("http://whatever.you.want.to.get/file.html")
fr = DirectCast(HttpWebRequest.Create(targetURI), System.Net.HttpWebRequest)
If (fr.GetResponse().ContentLength > 0) Then
Dim str As New System.IO.StreamReader(fr.GetResponse().GetResponseStream())
Response.Write(str.ReadToEnd())
str.Close();
End If
Catch ex As System.Net.WebException '访问资源时出错,处理它 结束尝试
您将获得Html以及http标头。不要认为这对https
本身有用。