我对编程比较陌生,并且已经在VBA中编写了一个web-scraper,我试图在Visual Studio上的VB.Net中重新创建它。我正在使用我在vba中使用的相同对象(mshtml.HTMLDocument),但由于某些原因,在visual studio中它似乎缺少.getElementsByClassName方法,这对我的程序来说是必不可少的。我只是不明白为什么在Visual Studio上的VB.net中会丢失它,如果我在VBA中创建时使用相同的参考库和相同的对象。
我做错了吗?
VBA Intellisense & Reference Library
Visual Studio VB.Net Intellisense, Reference Library, & Error
答案 0 :(得分:0)
System.Windows.Forms.HtmlDocument(在VB.NET中)不是mshtml.HtmlDocument(在VBA中)。如果没有看到相关的代码,我无法确定你是否还没有看到前者。
您可以编写自己的方法来获取具有特定类名的元素,而不是通过额外的步骤来获取后者,例如。
Public Class Form1
Dim wb As WebBrowser
Function GetElementsHavingClassName(doc As HtmlDocument, className As String) As List(Of HtmlElement)
Dim elems As New List(Of HtmlElement)
For Each elem As HtmlElement In doc.All
Dim classes = elem.GetAttribute("className")
If classes.Split(" "c).Any(Function(c) c = className) Then
elems.Add(elem)
End If
Next
Return elems
End Function
Sub ExtractElements(sender As Object, e As WebBrowserDocumentCompletedEventArgs)
Dim wb = DirectCast(sender, WebBrowser)
Dim flintstones = GetElementsHavingClassName(wb.Document, "flintstone")
If flintstones.Count > 0 Then
For Each fs In flintstones
' do something with the element
TextBox1.AppendText(fs.InnerText & vbCrLf)
Next
Else
TextBox1.Text = "Not found."
End If
End Sub
Sub DoStuff()
If wb Is Nothing Then
wb = New WebBrowser
End If
RemoveHandler wb.DocumentCompleted, AddressOf ExtractElements ' don't leave any old ones lying around
AddHandler wb.DocumentCompleted, AddressOf ExtractElements
Dim loc = "file:///c:\temp\somehtml.html"
Try
wb.Navigate(loc)
Catch ex As Exception
'TODO: handle the problem gracefully.
MsgBox(ex.Message)
End Try
End Sub
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
DoStuff()
End Sub
Private Sub Form1_FormClosing(sender As Object, e As FormClosingEventArgs) Handles MyBase.FormClosing
If wb IsNot Nothing Then
RemoveHandler wb.DocumentCompleted, AddressOf ExtractElements
wb.Dispose()
End If
End Sub
End Class
其中,给定HTML
<!DOCTYPE html>
<html>
<head><title></title></head>
<body>
<div class="fred flintstone">Fred</div>
<div class="wilma flintstone">Wilma</div>
<div class="not-a-flintstone">Barney</div>
</body>
</html>
输出
佛瑞德
威尔玛