使用VBA列出网页的所有URL地址

时间:2017-01-09 20:04:42

标签: excel vba list excel-vba web-scraping

我使用以下代码加载网站http://www.flashscore.com/soccer/england/premier-league/results/

找到并点击"显示更多匹配"链接,所有足球比赛都在浏览器中加载。

下面的代码只会给出结果的前半部分,在按下"显示更多匹配"之前显示的事件。链接。

我的问题是如何列出所有活动网址?

Sub Test_Flashscore()

Dim URL As String
Dim ie As New InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim dictObj As Object: Set dictObj = CreateObject("Scripting.Dictionary")
Dim tRowID As String

URL = "http://www.flashscore.com/soccer/england/premier-league/results/"

With ie
    .navigate URL
    .Visible = True
    Do Until .readyState = READYSTATE_COMPLETE: DoEvents: Loop
    Set HTMLdoc = .document
End With


For Each objLink In ie.document.getElementsByTagName("a")

   If Left(objLink.innerText, 4) = "Show" Or Left(objLink.innerText, 4) = "Arat" Then

        MsgBox "The link was founded!"
        objLink.Click

        Exit For

   End If

Next objLink


With HTMLdoc

    Set tblSet = .getElementById("fs-results")
    Set mTbl = tblSet.getElementsByTagName("tbody")(0)
    Set tRows = mTbl.getElementsByTagName("tr")
    With dictObj
        'If if value is not yet in dictionary, store it.
        For Each tRow In tRows
            'Remove the first four (4) characters.
            tRowID = Mid(tRow.ID, 5)
            If Not .Exists(tRowID) Then
                .add tRowID, Empty
            End If
        Next tRow
    End With
End With

i = 14
For Each Key In dictObj

    ActiveSheet.Cells(i, 2) = "http://www.flashscore.com/" & Key & "/#match-summary"
    i = i + 1

Next Key

Set ie = Nothing
MsgBox "Process Completed"

End Sub

2 个答案:

答案 0 :(得分:0)

您需要等待一段时间才能加载其他内容 - 单击该链接会触发对服务器的GET请求,因此需要返回内容并且需要先在页面上呈现内容抓住它。

答案 1 :(得分:0)

单击该链接会将您带到灯具。您可以将字典之前的所有内容替换为

.navigate "https://www.flashscore.com/football/england/premier-league/fixtures/"

也就是说:

Option Explicit
Public Sub GetInfo()
    Dim IE As New InternetExplorer
    With IE
        .Visible = True
        .navigate "https://www.flashscore.com/football/england/premier-league/fixtures/"

        While .Busy Or .readyState < 4: DoEvents: Wend

        'other code...using dictionary
        '.Quit
    End With
End Sub