我是webscraping的新手,并且拥有一些先前的VBA知识。 我正在尝试制作一个进入网站的刮刀进行搜索,然后搜索搜索的详细信息。 我非常恼火,我的刮刀可以使用给定的参数进行搜索,但是在搜索完成并加载网站后,我在VBA中创建了innerHTML读取请求,结果不是新页面的源代码。所以我无法提取任何信息,因为我的VBA代码没有看到实际的网页html数据。为什么会这样?我的VBA提取的源代码是什么?
非常感谢您的帮助!
Public Sub my_scraper()
Dim my_data1, my_data2 As String
Dim my_Coll As String
i = 1
my_data1 = ActiveSheet.Cells(1, 1).Value
my_data2 = ActiveSheet.Cells(1, 2).Value
my_Coll = profession_hu_scraper(my_data1, my_data2)
Cells(2, 2).Value = my_Coll
End Sub
Public Function profession_hu_scraper(ByVal my_data1 As String, ByVal my_data2 As String) As String
Dim objIE As InternetExplorer
Dim html As HTMLDocument
Dim Link As Object
Dim ElementCol As Object
Dim erow As Long
'Dim all_inp_el As Object
'Application.ScreenUpdating = False
Set objIE = CreateObject("InternetExplorer.Application")
With objIE
.Visible = True
.Navigate "https://www.profession.hu/"
Do While .ReadyState <> READYSTATE_COMPLETE
Application.StatusBar = "Loading website..."
DoEvents
Loop
Set html = .Document
Range("A16") = html.DocumentElement.innerHTML
.Document.getElementById("header_keyword").Value = my_data1
.Document.getElementById("header_location").Value = my_data2
Set my_classes = .Document.getElementsByClassName("p2_button_inner")
For Each my_class In my_classes
If my_class.getAttribute("value") = "Keresés" Then
Range("c4") = "Clicked"
my_class.Click
i = i + 1
End If
Next my_class
Do While .ReadyState <> READYSTATE_COMPLETE
Application.StatusBar = "Loading website..."
DoEvents
Loop
Set html = .Document
Range("B16") = html.DocumentElement.innerHTML
End With
Set objIE = Nothing
Application.StatusBar = "Finished"
'Application.StatusBar = ""
End Function
答案 0 :(得分:1)
经过几天的挣扎,我终于能够发现代码工作正常。问题是单元格的最大字符是32k,因此无法显示整个html代码。 如果你是初学者,请注意它!
答案 1 :(得分:-2)
更新:
Public Function profession_Hu_Scraper(myData1 As String, my_data2 As String)
Dim ie As New InternetExplorer
Dim doc As HTMLDocument
Dim ws As Worksheet: Set ws = ActiveSheet
ie.navigate "https://www.profession.hu/"
Do While ie.readyState <> READYSTATE_COMPLETE
Loop
Set doc = ie.document
ws.Range("A16") = doc.getElementById(myData1).innerText
ws.Activate("B16") = doc.getElementById(mydata2).innerText
'whatever else you wish to do
End Function