通过Span标签进行网页抓取

时间:2019-04-17 13:00:52

标签: html excel vba web-scraping

我正在尝试从下面提到的网站复制数据,网页上提到的所有大小和成本范围都需要所有数据。我在下面的代码框架,但我只能复制三个元素。有人可以调查一下吗?

网址-https://www.leetstorage.com/sizes-and-pricing

Sub TagClassName()

Dim ie As New InternetExplorer, ws As Worksheet

Set ws = ThisWorkbook.Worksheets("Unit Data")
With ie
    .Visible = True
    .Navigate2 "https://www.leetstorage.com/sizes-and-pricing"

    While .Busy Or .readyState < 4: DoEvents: Wend

    Dim listings As Object, listing As Object, headers(), results(), r As Long, c As Long, item As Object
    headers = Array("size")
    Set listings = .document.getElementById("site_content").getElementsByTagName("ul")

    ReDim results(1 To listings.Length, 1 To UBound(headers) + 1)
    For Each listing In listings

        r = r + 1
        On Error Resume Next
        results(r, 1) = listing.getElementsByClassName("font-size-NaN m-font-size-NaN")(0).innerText

        On Error GoTo 0
  Next
    Next
    ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
    ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
    .Quit

    End With


End Sub

1 个答案:

答案 0 :(得分:2)

您可以使用以下内容。您想要父类innerList中的父li(无序列表)元素中的子ul元素


Internet Explorer:

Option Explicit
'VBE > Tools > References:
' Microsoft Internet Controls
Public Sub RetrieveInfo()
    Dim IE As InternetExplorer, i As Long, items As Object
    Set IE = New InternetExplorer

    With IE
        .Visible = True
        .Navigate2 "https://www.leetstorage.com/sizes-and-pricing"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set items = .document.querySelectorAll(".innerList li")

        For i = 0 To items.Length - 1
            With ThisWorkbook.Worksheets("Sheet1")
                .Cells(i + 1, 1) = Trim$(items.item(i).innerText)
            End With
        Next
    End With
End Sub

XHR:

只要提供User-Agent标头,您就可以使用XHR更快地完成操作

Option Explicit
Public Sub GetInfo()
    Dim html As HTMLDocument, items As Object, i As Long '<  VBE > Tools > References > Microsoft HTML Object Library
    Set html = New HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.leetstorage.com/sizes-and-pricing", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        html.body.innerHTML = .responseText
    End With
    Set items = html.querySelectorAll(".innerList li")

    For i = 0 To items.Length - 1
        With ThisWorkbook.Worksheets("Sheet1")
            .Cells(i + 1, 1) = Trim$(items.item(i).innerText)
        End With
    Next
End Sub

ul个阻止

如果您仅查看ul的类名所返回的内容,则会在页面上获得包含以下列表的3个块:

enter image description here


ul和li:

仅采用这些块之一来举例说明在子li中添加descendant combinator元素的效果: