从不显示表的第一列的网页中提取表数据

时间:2014-02-17 11:15:30

标签: excel vba excel-vba

我需要从此页面获取价格表: http://www.kieskeurig.nl/objectief/canon/ef_100mm_f2_usm/prijzen/bezorgen/167557#prijzen

到目前为止,我已经开发了此代码来获取数据

Sub TableExample()
    Dim IE As Object
    Dim doc As Object
    Dim strURL As String

    strURL = "http://www.kieskeurig.nl/objectief/canon/ef_100mm_f2_usm/prijzen/bezorgen/167557#prijzen"

    ' replace with URL of your choice

    Set IE = CreateObject("InternetExplorer.Application")
    With IE
         '.Visible = True
        .navigate strURL
        Do Until .readyState = 4: DoEvents: Loop
        Do While .Busy: DoEvents: Loop
        Set doc = IE.document
        GetAllTables doc

        .Quit
    End With

End Sub

Sub GetAllTables(doc As Object)

     ' get all the tables from a webpage document, doc, and put them in a new worksheet

    Dim ws As Worksheet
    Dim rng As Range
    Dim tbl As Object
    Dim rw As Object
    Dim cl As Object
    Dim tabno As Long
    Dim nextrow As Long
    Dim I As Long

    Set ws = Sheets("Sheet1")


    For Each tbl In doc.getElementsByTagName("TABLE")
        tabno = tabno + 1
        nextrow = nextrow + 1
        Set rng = ws.Range("B" & nextrow)
        rng.Offset(, -1) = "Table " & tabno
        For Each rw In tbl.Rows
            For Each cl In rw.Cells
                rng.Value = cl.innerText
                Set rng = rng.Offset(, 1)
                I = I + 1
            Next cl
            nextrow = nextrow + 1
            Set rng = rng.Offset(1, -I)
            I = 0
        Next rw
    Next tbl

    ws.Cells.ClearFormats

End Sub

此代码对我有用

但问题是第1列,即供应商数据未显示在提取的表格中。

任何人都可以帮助我

1 个答案:

答案 0 :(得分:1)

GetAllTables子程序替换为以下内容:

Sub GetAllTables(doc As Object)

     ' get all the tables from a webpage document, doc, and put them in a new worksheet

    Dim ws As Worksheet
    Dim rng As Range
    Dim tbl As Object
    Dim rw As Object
    Dim cl As Object
    Dim tabno As Long
    Dim nextrow As Long
    Dim I As Long

    Set ws = Sheets("Sheet1")


    For Each tbl In doc.getElementsByTagName("TABLE")
        tabno = tabno + 1
        nextrow = nextrow + 1
        Set rng = ws.Range("B" & nextrow)
        rng.Offset(, -1) = "Table " & tabno
        For Each rw In tbl.Rows
            colno = 1
            For Each cl In rw.Cells
                If colno = 1 and nextrow > 1 then
                    Set classColl = doc.getElementsByClassName("shopLogoX")
                    Set imgTgt = classColl(nextrow - 2).getElementsByTagName("img")
                    rng.Value = imgTgt(0).getAttribute("alt")
                Else
                    rng.Value = cl.innerText
                End If
                Set rng = rng.Offset(, 1)
                I = I + 1
                colno = colno + 1
            Next cl
            nextrow = nextrow + 1
            Set rng = rng.Offset(1, -I)
            I = 0
        Next rw
    Next tbl

    ws.Cells.ClearFormats

End Sub

实际上,变化很小。我们使用colno来跟踪我们已经在该行中的哪一列。显然,我们检查我们是否在第一个单元格中。如果我们位于第一列而不是第一行(标题行),我们将创建一个具有类shopLogoX的元素集合。这包含具有我们想要的img属性的alt标记。

尝试,测试和工作。 如果有帮助,请告诉我们。