从站点抓取Web数据时出现Excel VBA Automation错误

时间:2019-03-15 14:58:11

标签: excel vba web-scraping

我创建了一个应用程序,可以从网站上抓取数据。

代码按预期运行少量的迭代。

但是,当代码多次执行时,它会因Automation Error而崩溃。

我的目标是从this website获取关于多个“边界方向”和多个“日期”的每日拍卖结果。

在浏览网站时已经检查了发送的HTTP请求,我能够使它们自动化并获取我感兴趣的数据,如下所示:

Sub seecao()
Dim request As New WinHttpRequest
Dim htmlDoc As New MSHTML.HTMLDocument
Dim tableTest As HTMLTable
Dim rowHTML As HTMLTableRow
Dim cellHTML As HTMLTableCell
Dim requestURL As String
Dim responseJSON As Object
Dim reqBody As String
Dim reqResponse As String
Dim seecaoBordersArray As Variant
Dim areaOut As String
Dim areaIn As String
Dim i As Long
Dim j As Long
Dim k As Long
Dim tempRng As Range
seecaoBordersArray = ThisWorkbook.Worksheets("Help").Range("seecaoBordersRng")
requestURL = "http://seecao.com/views/ajax"
Application.ScreenUpdating = False
For k = LBound(seecaoBordersArray, 1) To UBound(seecaoBordersArray, 1) Step 1
    areaOut = seecaoBordersArray(k, 1)
    areaIn = seecaoBordersArray(k, 2)
    Set tempRng = ThisWorkbook.Worksheets("seecao").Cells.Find(What:=areaOut + areaIn, After:=ActiveCell, LookIn:=xlFormulas, LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=False, SearchFormat:=False)
    i = tempRng.Offset(2, 0).Row
    j = tempRng.Offset(2, 0).Column
    reqBody = ThisWorkbook.Worksheets("Help").Range("A1:A1") + Format(Date + 1, "yyyy-mm-dd") + ThisWorkbook.Worksheets("Help").Range("A2:A2") + areaOut + "+-+" + areaIn + ThisWorkbook.Worksheets("Help").Range("A3:A3")
    With request
        .Open "POST", requestURL, False
        .setRequestHeader "Host", "seecao.com"
        .setRequestHeader "Accept", "application/json, text/javascript, */*; q=0.01"
        .setRequestHeader "Referer", "http://seecao.com/daily-results"
        .setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
        .send reqBody
        reqResponse = .responseText
    End With
    Set responseJSON = JsonConverter.ParseJson(reqResponse)
    htmlDoc.body.innerHTML = responseJSON(3)("data")
    Set tableTest = htmlDoc.getElementsByTagName("table")(0)
    For Each rowHTML In tableTest.Rows
        If rowHTML.Cells(0).innerText <> "Date " Then
            ThisWorkbook.Worksheets("seecao").Cells(i, j) = rowHTML.Cells(2).innerText
            ThisWorkbook.Worksheets("seecao").Cells(i, j + 1) = rowHTML.Cells(5).innerText
            i = i + 1
        End If
    Next rowHTML
Next k
Application.ScreenUpdating = True
End Sub

HTTP请求的主体由各种参数组成,这些参数形成一个巨大的字符串。我将该字符串分为3部分,以便能够编辑“日期”和“边界方向”,并且将这3部分存储在ThisWorkbook.Worksheets("Help").Range("A1:A1")ThisWorkbook.Worksheets("Help").Range("A2:A2")ThisWorkbook.Worksheets("Help").Range("A3:A3")中。 / p>

所有可能的“边界方向”都存储在2x2的命名范围seecaoBordersArray中。代码在此范围内循环,生成相应的请求正文,发送请求并获取响应。

seecaoBordersArray

响应为JSON格式。我感兴趣的响应部分是位于JSON字符串中的HTML表。

然后解析JSON字符串以获取HTML表。最终,HTML表被解析,感兴趣的数据被写在工作表上。

Results printout

正如我在本文开头所说的那样,代码通常在大多数情况下都能正常工作,但是根据正在执行的迭代次数,它会随机Automation Error崩溃。

例如,当我仅对“边界方向”的2个组合执行它时,它永远不会崩溃。

在我看来,这似乎是某种性能问题,而不是错误。

任何有关如何提高性能并避免这些崩溃的建议将不胜感激。

0 个答案:

没有答案