代码中的trs导致运行时错误

时间:2018-12-08 13:54:26

标签: html excel vba parsing web-scraping

我正尝试在我创建的文件夹中为一群公司提取现金流。我正在从市场观察中获取信息。我要从中提取表格的网站的示例是https://www.marketwatch.com/investing/stock/aapl/financials/cash-flow。每个公司的所有股票代码都在A列中。我的代码在下一行中断,错误为“运行时错误“ 91”。

Set tRow = hTable.getElementsByTagName("tr")

我知道HTML代码中有trs。此外,我为两家公司运行了代码),然后再次执行时,代码再也没有超过第一个代码了(因为我正在测试,所以第一次没有保存和关闭功能)因此我退出了我所做的每个工作簿,但没有保存它们。)

Public Sub Companies()
Dim sResponse As String, html As HTMLDocument, hTable As Object

Application.ScreenUpdating = False


Dim Last As Long
Dim i As Integer
Dim ws As Worksheet

Last = Cells(Rows.Count, "A").End(xlUp).Row
For i = Last To 572 Step -1

M = 0

Workbooks.Open "C:***\Desktop\Stock Portfolio\Stock Valuations\Temporary Valuations\" & Cells(i, "A").Value & ".xlsx"

ThisWorkbook.Activate
Set ws = Workbooks(Cells(i, "A").Value).Sheets.Add(After:= _
         Workbooks(Cells(i, "A").Value).Sheets(Workbooks(Cells(i, "A").Value).Sheets.Count))
ws.Name = "Cash Flow"

ThisWorkbook.Activate
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", "https://www.marketwatch.com/investing/stock/" & Cells(i, "A").Value & "/financials/cash-flow", False
    .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
    .send
    sResponse = StrConv(.responseBody, vbUnicode)
End With

ThisWorkbook.Activate
With html
    .body.innerHTML = sResponse
    Set hTable = .getElementsByTagName("tbody")(0)
    WriteTable hTable, 1, Workbooks(Cells(i, "A").Value).Sheets("Cash Flow")
End With

ThisWorkbook.Activate

M = 3

With html
    .body.innerHTML = sResponse
    Set hTable = .getElementsByTagName("tbody")(1)
    WriteTable hTable, 1, Workbooks(Cells(i, "A").Value).Sheets("Cash Flow")
End With
Workbooks(Cells(i, "A")).Save
Workbooks(Cells(i, "A")).Close
Next
End Sub

我使用上面的代码,然后使用下面的公共代码(发生问题的地方)来获取表。

Public Sub WriteTable(ByVal hTable As Object, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)
If ws Is Nothing Then Set ws = ActiveSheet
Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
startRow = (M * 20) + 1
r = startRow
With ws
    Set tRow = hTable.getElementsByTagName("tr")
    For Each tr In tRow
        r = r + 1: c = 1
        Set tCell = tr.getElementsByTagName("td")
        For Each td In tCell
            .Cells(r, c).Value = td.innerText
            c = c + 1
        Next td
    Next tr
End With
End Sub

1 个答案:

答案 0 :(得分:1)

不是理想的答案,但请务必检查得到的答案。此外,检查hTable是否为空。如果我检查了回复,就会发现该站点正在寻找 bots 并使用验证码进行拦截。

  

原谅我们的打扰...

     

当您浏览www.marketwatch.com时,有关您浏览器的某些信息使我们认为您是机器人。这有几个原因   可能会发生:

     

您是超级用户,正在超级用户中浏览此网站   速度。您已在网络浏览器中禁用了JavaScript。第三方   浏览器插件(例如Ghostery或NoScript)正在阻止JavaScript   从运行。此支持中提供了更多信息   文章。

     

完成下面的验证码后,您将立即重新获得访问权限   到www.marketwatch.com。

如果确实如此,那么您可以选择以下几种方式:

1)搜索其他信息来源

2)使用浏览器自动化(基本硒),并希望仅此一项或经过适当的等待就可以达到目标

3)更改IP和用户代理。如果您最初能够在此页面上运行XHR,则可能是因为现在该站点已将您添加到可疑机器人的监视列表。我不愿意更改IP和用户代理。