VBA .getElementsByTagName()不返回元素

时间:2019-02-10 04:08:39

标签: excel vba web-scraping

我正在尝试读取betfair上的EPL的投注数据。当我运行以下子程序时,elements.Length返回0。

Sub PullBetfair()

    ' SOCCER
    Const soccerEPL  As String = "https://www.betfair.com.au/exchange/plus/football/competition/10932509"   ' EPL

    ' DECLARE INTERNET EXPLORER
    Dim ie As New InternetExplorer
    ie.Visible = False

    ' NAVIGATE TO URL
    ie.navigate soccerEPL

    ' LOOP UNTIL NAVIGATION COMPLETE
    Do
        DoEvents
    Loop Until ie.readyState = READYSTATE_COMPLETE

    ' COLLECT HTML DOCUMENT
    Dim html As HTMLDocument
    Set html = ie.document

    ' CREATE COLLECTION OF ELEMENTS
    Dim elements As IHTMLElementCollection

    Set elements = html.getElementsByTagName("section")
    Debug.Print elements.Length

    ie.Quit
    Set ie = Nothing
End Sub

我已使用此方法从ladbrokes等其他站点成功收集了数据,但此站点未成功收集

我在另一个网站上看到了一些有关框架的内容。 HTML对我来说是新的,所以我不太了解它在说什么。

我还尝试使用.getElementsByClassName来收集元素,但没有成功。

一个理想的答案也许可以解释层次结构,这样我就可以理解如何深入研究要读取的表行。

非常感谢

3 个答案:

答案 0 :(得分:2)

以下内容对长度使用适当的等待和定时循环测试。

Option Explicit  
Public Sub TestForTags()
    Dim ie As New InternetExplorer, sections As Object, t As Date
     Const MAX_WAIT_SEC As Long = 10
    With ie
        .Visible = True
        .Navigate2 "https://www.betfair.com.au/exchange/plus/football/competition/10932509"
        While .Busy Or .readyState < 4: DoEvents: Wend
         t = Timer
        Do
            Set sections = ie.document.querySelectorAll("section")
            If Timer - t > MAX_WAIT_SEC Then Exit Do
        Loop While sections.Length = 0

        Debug.Print sections.Length
        Stop '<== Delete me later
        '.Quit
    End With
End Sub

sections是一个节点列表,因此使用For i = 0 To sections.Length -1并由.item(i).innerText访问节点。您可以滑动并使用Set sections = .document.getElementsByTagName("section"),然后使用“ For Each”。

答案 1 :(得分:1)

我不是专家,但是我根据您的代码苦苦挣扎。这是我的小调整,它似乎可以正常运行,但不如预期的那样,如果IE未准备好,代码将崩溃。如果您可以结合其他人关于测试IE是否准备就绪的答案,那就太好了。

While ie.Busy Or ie.readyState <> READYSTATE_COMPLETE
    mHour = Hour(Now())
    mMinute = Minute(Now())
    mSec = Second(Now()) + 1 'Wait one more second
    waitTime = TimeSerial(mHour, mMinute, mSec)
    Application.Wait waitTime
Wend    

...

Set elements = html.getElementsByTagName("tr")

    For i = 1 To elements.Length - 1 '
        Debug.Print elements(i).textContent
    Next i

答案 2 :(得分:0)

我编写了此函数,用于加载页面。有时候,我发现页面无法正确刷新,卡住并且从未真正完成页面的加载。

它将休眠100毫秒,然后检查页面是否已加载,如果3秒钟后页面仍未完成刷新/加载,则会刷新并重试。

您将像这样使用它

ie.navigate "google.com"
waitforietoload ie 

这需要在模块的顶部

#If VBA7 Then
    Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr) 'For 64 Bit Systems
#Else
    Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long) 'For 32 Bit Systems
#End If

Option Compare Text

然后在模块中的任意位置

Function waitForIEToLoad(ie As InternetExplorer)

Dim times, times2 As Integer

Do While ie.readyState <> READYSTATE_COMPLETE Or ie.Busy
    DoEvents
    Sleep 100
    times = times + 1
    If times = 30 Then
        ie.Refresh
        times2 = times2 + 1
        If times2 = 3 Then
            Exit Do
        End If
    End If
Loop

End Function