所以我尝试将多个网站的数据从excel中删除。我认为代码在理论上运行良好,虽然我有"对象变量或块变量没有设置错误"。
我面临的第一个问题是,它有时会从2到10完美地循环并插入每一位数据,但有时我会收到错误并看到只插入了1或2行数据。我真的无法弄清楚到底是什么原因。
其次,这是某种演示代码。我只使用一小批数据并通过它循环。我的真正目的是找到一种方法,使网络抓取最多100行,而不会崩溃我的计算机或浏览器。如果我将我的代码转换为XMLHTTP类型的抓取是否会更好,如果是这样我该怎么办呢。
先谢谢
Private Sub CommandButton1_Click()
Dim ie As Object
Dim iexp As Object
Dim firstname(1 To 10), lastname(1 To 10) As Variant
Dim mm(1 To 10), dd(1 To 10), yyyy(1 To 10) As Integer
Dim PhoneNumber(1 To 10) As Variant
Dim Address(1 To 10) As Variant
Dim HomeValue(1 To 10) As Variant
Dim i As Integer
For i = 2 To 10
'get variables from excel sheet1 and search on peoplefinders.com
firstname(i) = Sheet1.Cells(i, 1).Value
lastname(i) = Sheet1.Cells(i, 2).Value
mm(i) = Sheet1.Cells(i, 3).Value
dd(i) = Sheet1.Cells(i, 4).Value
yyyy(i) = Sheet1.Cells(i, 5).Value
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.Height = 1000
ie.Width = 1000
ie.navigate ("http://www.peoplefinders.com/peoplesearch/searchresults?search=People&fn=" & firstname(i) & "&mn=&ln=" & lastname(i) & "&city=&state=&age=&dobmm=" & mm(i) & "&dobdd=" & dd(i) & "&doby=" & yyyy(i))
Do While ie.Busy: DoEvents: Loop
Dim Doc As HTMLDocument
Set Doc = ie.document
'get elements and insert into cells in sheet 1
PhoneNumber(i) = Doc.getElementsByTagName("td")(2).getElementsByTagName("a")(0).innerText
Address(i) = Doc.getElementsByTagName("td")(1).getElementsByTagName("a")(0).innerText
Sheet1.Cells(i, 6).Value = PhoneNumber(i)
Sheet1.Cells(i, 7).Value = Address(i)
'modify address for next search
a = Split(Address(i), " ")
b = Join(a, "-")
'search home value on zillow.com
Set iexp = CreateObject("InternetExplorer.Application")
iexp.Visible = True
iexp.Height = 1000
iexp.Width = 1000
iexp.navigate ("http://www.zillow.com/homes/" & b & "_rb/")
Do While iexp.Busy: DoEvents: Loop
Dim Doc2 As HTMLDocument
Set Doc2 = iexp.document
iexp.navigate ("http://www.zillow.com/homes/" & b & "_rb/")
'insert home value into cells in sheet 1
HomeValue(i) = Doc2.getElementsByClassName("home-summary-row")(1).getElementsByTagName("span")(1).innerText
Sheet1.Cells(i, 8).Value = HomeValue(i)
Next
End Sub
答案 0 :(得分:0)
正如Kerem Turgutlu指出的那样,检查Busy通常是否足够,你必须检查readystate。我是这样做的:
Sub WaitBrowser(browser As Object)
Do While browser.Busy
DoEvents
Loop
Do While browser.readyState <> 4
DoEvents
Loop
End Sub
然后我在每次导航后调用WaitBrowser IE
(其中IE
是我的InternetExplorer.Application
对象),然后插入文档元素。
至于其他方法,只要有可能,为了提高效率和可预测性,我更喜欢直接使用API发送HTTP消息(我通常使用WinHTTP,但我相信也可以使用XMLHTTP或winInet)。两个值得注意的例外:1)我想在处理期间或之后将用户引导到浏览器,或者2)在查找要发送的内容以获取所需数据时涉及复杂的脚本(在这种情况下,让浏览器更容易执行其操作作业)。
这是一个改编自最近项目的例子(没有错误检查):
Function FindLink() As String
Dim Request as Object
Set Request = CreateObject("WinHttp.WinHttpRequest.5.1")
Request.Open "GET", "http://example.com/pagewithinfo"
Request.Send
Dim resp as String
resp = Request.ResponseText
'create html tree with response
Dim h As Object
Set h = CreateObject("htmlfile")
h.body.innerHTML = respA
'get the info
FindLink = h.DocumentElement.GetElementsByTagName("a")(0).GetAttribute("href")
Set h = Nothing
Set Request = Nothing
End Function
答案 1 :(得分:0)
XHR请求在第一个结果页面上列出所有结果。您可以在循环中使用它来访问不同的结果页面。
需要引用HTML Object Library
和Microsoft XML library
。 XMLHTTP60
特定于Excel2016。您可能需要适应您的版本,例如,从末尾删除60。
Option Explicit
Public Sub GetInfo()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim ws As Worksheet: Set ws = ActiveSheet
Const URL As String = "https://www.peoplefinders.com/peoplesearch/searchresults?search=People&fn=Je&mn=&ln=Bloggs&city=&state=AL&age=&dobmm=&dobdd=&doby="
Application.ScreenUpdating = True
With http
.Open "GET", URL, False
.send
html.body.innerHTML = .responseText
End With
Dim hTables As Object, hTable As Object
Set hTables = html.getElementsByTagName("table")
For Each hTable In hTables
WriteTable hTable, GetLastRow(ws, 1) + 1, ws
Next hTable
Application.ScreenUpdating = True
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, ByVal columNum As Long) As Long
With ws
GetLastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
End With
End Function
Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)
If ws Is Nothing Then Set ws = ActiveSheet
Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, R As Long, C As Long, tBody As Object
R = startRow
With ws
Dim headers As Object, header As Object, columnCounter As Long
Set headers = hTable.getElementsByTagName("th")
For Each header In headers
columnCounter = columnCounter + 1
.Cells(startRow, columnCounter) = header.innerText
Next header
startRow = startRow + 1
Set tBody = hTable.getElementsByTagName("tbody")
For Each tSection In tBody 'HTMLTableSection
Set tRow = tSection.getElementsByTagName("tr") 'HTMLTableRow
For Each tr In tRow
R = R + 1
Set tCell = tr.getElementsByTagName("td")
C = 1
For Each td In tCell 'DispHTMLElementCollection
.Cells(R, C).Value = td.innerText 'HTMLTableCell
C = C + 1
Next td
Next tr
Next tSection
End With
End Sub
您可以通过在URL字符串的末尾添加页码来循环浏览各个页面,例如
https://www.peoplefinders.com/peoplesearch/searchresults?fn=Je&ln=Bloggs&state=AL&search=People&StartPage=2
您可以获得要循环的页面数
Dim a As Object, numPages As Long
Set a = html.querySelectorAll("a.pageslinks.PaginationLinks")
numPages = a(a.Length - 1).innerText
这样可以确定何时停止循环进行特定搜索的结果页面。