VBA方法htmlDoc.getElementBy *不返回任何内容

时间:2020-09-06 16:07:40

标签: html vba web-scraping

问候大家。


目标

我想打开flashscore.com并获取与足球比赛有关的数据。 我想要本赛季的所有比赛,因此必须多次单击“显示更多比赛”链接。


限制

  • 我需要使用VBA做到这一点
  • 此网站不支持Internet Explorer
  • 我无法在要使用的PC上安装任何东西,因此Selenium作为选项被拒绝了

考虑到以上所有内容,剩下的唯一选择似乎是Microsoft XML, v6.0库。


我尝试过的东西

我已经阅读了这篇文章的几个可能重复的文章,并尝试了它们的解决方案,但是到目前为止,似乎没有任何帮助。
这是带有注释的代码,解释每种情况:

Option Explicit

Sub Get_Matches()
'REFERENCE TO Microsoft XML, v6.0
Dim httpReq As New MSXML2.XMLHTTP60
Dim doc As MSHTML.HTMLDocument
Dim eleCol As Object
Dim ele As MSHTML.HTMLHtmlElement
    
'Open site and get html. In comments:things proposed by others but seemed to make no change.
    httpReq.Open "GET", "https://www.flashscore.com/football/england/premier-league-2019-2020/results/"
    'httpReq.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
    httpReq.send
    Set doc = New MSHTML.HTMLDocument
    doc.body.innerHTML = httpReq.responseText
    'doc.body.innerHTML = httpReq.responseBody


'Tried to get the div containing the first match:
    Set ele = doc.getElementById("g_1_2JDks1o7") '--> returns nothing

'Tried to get the "Show more matches" link:
    Set ele = doc.getElementsByClassName("event__more")(0) '--> returns nothing

'Tried to get the first ancestor that has an id:
    Set ele = doc.getElementById("live-table") '--> returns nothing


'Tried to get all <a> elements and then narrow them down till I find the "Show more matches" link:
    Set eleCol = doc.getElementsByTagName("a")
    For Each ele In eleCol
        If ele.href Like "*[#]*" And ele.innerText = "Show more matches" Then 
            Exit For
        End If
    Next ele
    ele.Click '--> I get the "Show more matches" link, but nothing seems to change
'Someone suggested firing onclick event.
    ele.FireEvent "onclick" '--> did nothing


'Some people suggested waiting.
'So I tried this:
    Do
        DoEvents
    Loop While doc.readyState = "loading"
    If doc.readyState <> "complete" Then GoTo ERROR_END
    Set doc = New MSHTML.HTMLDocument
    doc.body.innerHTML = httpReq.responseText
'and the Delay_Code_By function below,
    Set ele = doc.getElementById("live-table") '--> but still returns nothing


'Some people suggested looping, so the document gets loaded first.
    Do: Set ele = doc.getElementById("live-table"): DoEvents: Loop While ele Is Nothing '--> resulted to infinite loop.
    Do: Set eleCol = doc.getElementsByClassName("event__more"): DoEvents: Loop Until eleCol.Length > 0 '--> resulted to infinite loop.


'This block extracts html in a txt file in desktop, to help me see it, as it is at runtime.
'------------------------------------------------------------------------------------------
'    Dim fso As Scripting.FileSystemObject
'    Dim txtFile As Scripting.TextStream
'    Set fso = New Scripting.FileSystemObject
'    Set txtFile = fso.OpenTextFile(CreateObject("WScript.Shell").SpecialFolders("Desktop") & "\output.txt", 2, True, TristateTrue)
'    txtFile.Write httpReq.responseText
'    txtFile.Close
'    Set txtFile = Nothing
'    Set fso = Nothing
'------------------------------------------------------------------------------------------
'Inside the txt file I found html code that onclick calls some kind of function:

'Tried to call this function:
    doc.parentWindow.execScript "document.body.classList.toggle('loading', true);", "JScript" '--> throws automation error (probably there's some error with my syntax).

'I also tried to call this function:
    doc.parentWindow.execScript "function(){return cjs.Api.loader.get('cjs').call(function(_cjs){loadMoreGames(_cjs);});};" '--> which did not throw error but did nothing.


    Exit Sub
ERROR_END:
    MsgBox "error"
End Sub


Public Sub Delay_Code_By(seconds As Integer)
Dim endTime As Date
    endTime = DateAdd("s", seconds, Now)
    Do While Now < endTime
        DoEvents
    Loop
End Sub

问题

  1. 如果除了使用Microsoft XML, v6.0库之外,我还有其他选择,请告诉我尝试。
  2. 据我了解,问题在于,我尝试获取的元素并不存在。如果我是正确的,谁能解释我为什么会这样,并且可能的解决方法是这个吗?如果我错了,请指出正确的方向。谢谢。

更新

我将根据@QHarr关于观看“网络/ xhr”标签的评论发布此屏幕截图。 network xhr tab

0 个答案:

没有答案