从VBA中的多个网页导入数据

时间:2016-03-21 03:26:04

标签: vba import

我目前正在尝试使用工作表和VBA上的命令按钮,通过官方体育网站点击按钮,将玩家统计数据导入我的工作表。我想要复制的数据超过21个不同的网页。

网页的网址格式如下: http://www.afl.com.au/stats/player-ratings/overall-standings#page/1 网址最后的数字在1到21之间变化,表示您正在查看哪个数据表。

请参阅下面的代码:

Sub Button1_Click()

    Const WebAddress As String = "http://www.afl.com.au/stats/player-ratings/_
    overall-standings#page/"

    Dim qt As QueryTable
    Dim PlayerRatings As Worksheet
    Dim PageNumber As Integer
    Dim RowPasteNumber As Integer

    RowPasteNumber = 6
    Set PlayerRatings = ActiveSheet

    For PageNumber = 1 To 21

        Set qt = PlayerRatings.QueryTables.Add(Connection:="URL;" & WebAddress & PageNumber,_ 
        Destination:=Range("A" & RowPasteNumber))
        qt.Refresh BackgroundQuery:=False
        RowPasteNumber = RowPasteNumber + 41

    Next PageNumber

End Sub

所以我的想法是我应该能够使用FOR循环,每次递增PageNumber整数以循环通过数据所在的不同网页,然后从该网页复制数据并将其粘贴到前一行的41行我工作表中的数据。

我遇到的问题是FOR循环完成了21次迭代(应该如此),并且每次将数据粘贴到先前数据下方41行(应该如此),但它会继续复制数据在网页1上一遍又一遍。

有人可以看到为什么我的代码可能会这样做吗?

非常感谢您的协助。

斯蒂芬

2 个答案:

答案 0 :(得分:0)

我最近在网络抓取时遇到了类似的问题。

问题是地址包含位置哈希#。服务器永远不会处理#之后的任何内容。

重复另一个stackoverflow答案(Why the hash part of the URL is not in the server side?

的一部分
  

以下是 Wikipedia 所说的内容:

     
    

片段标识符的功能与URI的其余部分不同:即,它的处理完全是客户端,没有服务器的参与。当代理(例如Web浏览器)从Web服务器请求资源时,代理会将URI发送到服务器,但不会发送该片段。相反,代理等待服务器发送资源,然后代理根据片段值处理资源。在最常见的情况下,代理将Web页面向下滚动到锚元素,该锚元素具有等于片段值的属性字符串。其他客户行为也是可能的

  


解决此问题的最简单方法是直接自动化IE对象,并在每次导航和/或单击操作后获取document.body.innerHTML的副本。有关起点,请查看此处:http://www.excely.com/excel-vba/ie-automation.shtml

答案 1 :(得分:0)

您可以使用IE将其写出来以浏览页面:


网络样本:

SAMPLE


示例代码输出:

Sample


VBA:

Option Explicit

Public Sub GetTables()
    Dim id, hTable As HTMLTable, ie As Object, ws As Worksheet
    Application.ScreenUpdating = False
    Set ws = ActiveSheet
    Set ie = CreateObject("InternetExplorer.Application")
    With ie
        .Visible = True
        For id = 1 To 2  '21
            .navigate "http://www.afl.com.au/stats/player-ratings/overall-standings#page/" & id
           While .Busy Or .readyState < 4: DoEvents: Wend
           Dim file As Object
           Set file = CreateObject("htmlFile")
           With file
                DoEvents
                .Write ie.document.body.innerHTML
                Set hTable = .getElementById("playerRatings-table")
                WriteTable hTable, GetLastRow(ws, 1) + 1
            End With
            Set hTable = Nothing: Set file = Nothing
        Next id
        .Quit
    End With
    Application.ScreenUpdating = True
End Sub

Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)

    If ws Is Nothing Then Set ws = ActiveSheet

    Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, R As Long, C As Long, tBody As Object
    R = startRow
    With ActiveSheet
        Dim headers As Object, header As Object, columnCounter As Long
        Set headers = hTable.getElementsByTagName("th")
        For Each header In headers
            columnCounter = columnCounter + 1
            .Cells(startRow, columnCounter) = header.innerText
        Next header
        startRow = startRow + 1
        Set tBody = hTable.getElementsByTagName("tbody")
        For Each tSection In tBody               'HTMLTableSection
            Set tRow = tSection.getElementsByTagName("tr") 'HTMLTableRow
            For Each tr In tRow
                R = R + 1
                Set tCell = tr.getElementsByTagName("td")
                C = 1
                For Each td In tCell             'DispHTMLElementCollection
                    .Cells(R, C).Value = td.innerText 'HTMLTableCell
                    C = C + 1
                Next td
            Next tr
        Next tSection
    End With
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function