VBA WebScraping返回空值

时间:2019-03-07 16:34:21

标签: excel vba web-scraping screen-scraping

我有以下代码从网站上抓取数据,问题是它没有抓取任何数据,它没有显示任何错误,但也没有给我任何结果...

Option Explicit

Public Sub Loiça()
Sheets("Loiça").Range("A:A,Z:Z").EntireColumn.Delete
    Dim IE As New InternetExplorer, i As Long, data As Object, div As Object, item As Object, r As Long, c As Long
    With IE
        .Visible = False
        .Navigate2 "https://www.radiopopular.pt/categoria/maquina-de-lavar-louca/"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim numResults As Long, arr() As String
        arr = Split(.document.querySelector(".status.cb").innerText, Chr$(32))
        numResults = arr(LBound(arr))
        Dim resultsPerPage As Long
        resultsPerPage = .document.querySelectorAll(".data cb").Length
            If i > 1 Then
                .Navigate2 ("https://www.radiopopular.pt/categoria/maquina-de-lavar-louca/")
                While .Busy Or .readyState < 4: DoEvents: Wend
            End If
            Set data = .document.getElementsByClassName("data cb")
            For Each item In data
                r = r + 1: c = 1
                For Each div In item.getElementsByTagName("div")
                    With ThisWorkbook.Worksheets("Loiça")
                        .Cells(r, c) = div.innerText
                    End With
                    c = c + 1
                Next
            Next
        .Quit
    End With
    '---------------------------------------------------------------------------'
End Sub

1 个答案:

答案 0 :(得分:0)

这是一个有趣的挑战。需要注意的几点:

  1. 该页面似乎没有加载到Internet Explorer中(至少对我而言)-这可能是因为不支持旧版浏览器。因此,需要切换到Selenium basic和Chrome。下载并安装selenium basic之后,您可能必须在latest的selenium文件夹中切换ChromeDriver.exe。然后,您需要转到VBE>工具>参考>向Selenium Type Library添加参考。
  2. 页面使用ajax动态加载12条记录。您需要滚动页面,直到显示所有结果。
  3. 您无法以与显示相同的方式检索结果计数,因为返回的字符串是不同的且可能是可变的格式。相反,您可以从存储此数字的元素中获取总数。
  4. 要继续使用所写的语法,您需要将页面html传输到HTMLDocument变量中,然后使用它。

VBA:

Option Explicit
Public Sub Loiça()
    Dim d As WebDriver, t As Date, i As Long, data As Object, div As Object, item As Object, r As Long, c As Long
    Dim numResults As Long, html As HTMLDocument
    Const MAX_WAIT_SEC As Long = 600
    Const URL As String = "https://www.radiopopular.pt/categoria/maquina-de-lavar-louca/"

    Set d = New ChromeDriver

    With d
        .Start "Chrome"
        .get URL

        Worksheets("Loiça").Range("A:A,Z:Z").EntireColumn.Delete
        numResults = .FindElementByCss("#total").Text
        t = Timer
        Do
            .ExecuteScript "window.scrollBy(0, window.innerHeight);", "javascript"
            If Timer - t > MAX_WAIT_SEC Then Exit Do
        Loop Until .FindElementByCss("#products").Text = numResults
        Set html = New HTMLDocument
        html.body.innerHTML = .PageSource
        Set data = html.getElementsByClassName("data")
        For Each item In data
            r = r + 1: c = 1
            For Each div In item.getElementsByTagName("div")
                With ThisWorkbook.Worksheets("Loiça")
                    .Cells(r, c) = div.innerText
                End With
                c = c + 1
            Next
        Next
        .Quit
    End With
End Sub