抓取时无法丢弃表的前两列

时间:2019-03-02 06:28:32

标签: excel vba web-scraping

我已经使用xmlhttp请求在vba中创建了一个宏,以解析网页中的某些表格数据。当我尝试以下脚本时,可以获得该表的全部内容。但是,我的目的是将内容从列Card#移至其余部分。最重要的是,我想摆脱前两列ImageSpec#

website link

  

除前两列外,如何获取表格内容?

这是我到目前为止的尝试:

Sub GetTable()
    Dim S$, c&, R&, elem As Object, tRow As Object

    With New XMLHTTP60
        .Open "GET", "https://www.psacard.com/psasetregistry/baseball/company-sets/2018-topps-now/publishedset/271273", False
        .send
        S = .responseText
    End With

    With New HTMLDocument
        .body.innerHTML = S
        For Each elem In .getElementsByTagName("table")(0).Rows
            For Each tRow In elem.Cells
                c = c + 1: Cells(R + 1, c) = tRow.innerText
            Next tRow
            c = 0: R = R + 1
        Next elem
    End With
End Sub

在执行脚本之前要包括的参考:

Microsoft XML, v6.0
Microsoft HTML Object Library

1 个答案:

答案 0 :(得分:1)

似乎最简单的方法是测试c的值> 3并将输出c调整为-3

Option Explicit

Public Sub GetTable()
    Dim S$, c&, R&, elem As Object, tRow As Object

    With New XMLHTTP60
        .Open "GET", "https://www.psacard.com/psasetregistry/baseball/company-sets/2018-topps-now/publishedset/271273", False
        .send
        S = .responseText
    End With

    With New HTMLDocument
        .body.innerHTML = S
        For Each elem In .getElementsByTagName("table")(0).rows
            For Each tRow In elem.Cells
                c = c + 1
                If c > 3 Then Activesheet.Cells(R + 1, c - 3) = tRow.innerText
            Next tRow
            c = 0: R = R + 1
        Next elem
    End With
End Sub