如何使用VBA从网站抓取位置名称

时间:2020-02-24 09:39:08

标签: excel vba web-scraping

我正在尝试从租车网站主页上抓取皮卡车支店的地址。这样做的目的是准确了解给定公司的分支机构在哪里。

我以前已经成功创建了该网站,但是该公司最近对其网站进行了改版,现在我的代码无法正常工作。分支位置似乎隐藏在某种形式中,只有在您单击取件位置空间后,这些位置才会在html中可见。

我当前的代码如下:

Option Explicit
Private Sub pickuplocations()
    Dim html As Object
    Dim ws As Worksheet
    Dim headers()
    Dim i As Long
    Dim r As Long
    Dim c As Long
    Dim numrows As Long

        Set ws = ThisWorkbook.Worksheets("Europcar Branches(2)")
        Set html = New HTMLDocument

            With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", "https://www.europcar.co.za", False
            .send
            html.body.innerHTML = .responseText   'fetches all html from the website

    Dim pickupbranches As Object
    Dim pickupbranchresults()

        Set pickupbranches = html.getElementById("_location-search-widget_15").getElementsByTagName("span") 
        headers = Array("Pickup Location", "Option value") 'for the ws
        numrows = pickupbranches.Length - 1   'sets the row length

        ReDim pickupbranchresults(1 To numrows, 1 To 2)  'sets array size for the results
            For i = 1 To numrows
                pickupbranchresults(i, 1) = pickupbranches.Item(i).innerText 
                pickupbranchresults(i, 2) = pickupbranches.Item(i).Value    
            Next

        With ws

            .Cells(1, 1).Resize(1, UBound(headers) + 1) = headers 'sets the column headers
            .Cells(2, 1).Resize(UBound(pickupbranchresults, 1), UBound(pickupbranchresults, 2)) = pickupbranchresults 
        End With
            End With
End Sub

1 个答案:

答案 0 :(得分:0)

您当前的代码请求源HTML代码并尝试对其进行抓取。

但是,如注释中所述,当您单击搜索栏时,位置列表将动态加载,并且不属于页面源HTML。因此,您的代码将不会产生任何结果。

刮取专用于该位置的页面更有意义:

https://www.europcar.co.za/rental-locations/

现在,如果在加载页面时导航到此页面并在浏览器的开发人员工具( F12 )中检查网络流量,则会看到一个XHR请求正在已发送。看起来像这样:

enter image description here

如果遍历请求的标题和参数,您将看到url,正文和标题的外观。在这种特殊情况下,没有参数,并且标头对于请求成功并非必不可少,因此您所需要的只是URL。

响应的有效载荷为json格式。您可以使用this之类的工具检查其结构。外观如下:

enter image description here

基本上,JSON由不同的国家/地区组成,每个国家/地区由省份组成,每个省/地区由相应的分支机构组成。每个分支都包含所有相应的信息。

要解析这样的响应,您需要一个JSON解析器(请参阅本文的结尾)。

TL; DR

代码如下:

Option Explicit

Sub getLocations()
Dim req As New WinHttpRequest
Dim url As String, results() As String
Dim sht As Worksheet
Dim responseJSON As Object, country As Object, province As Object, branch As Object
Dim i As Long
Dim rng As Range

Set sht = ThisWorkbook.Worksheets("Name of your Worksheet")
url = "https://www.europcar.co.za/api/rentalLocations/impressLocations"

With req
    .Open "GET", url, False
    .send
    Set responseJSON = JsonConverter.ParseJson(.responseText)
End With

For Each country In responseJSON
    For Each province In country("provinces")
        i = 0
        ReDim results(1 To province("branches").Count, 1 To 5)
        For Each branch In province("branches")
            i = i + 1
            results(i, 1) = country("name")
            results(i, 2) = province("name")
            results(i, 3) = branch("name")
            results(i, 4) = branch("emailAddress")
            results(i, 5) = branch("contactNumber")
        Next branch
        With sht
            Set rng = .Range("A" & .Rows.Count).End(xlUp).Offset(1, 0)
        End With
        rng.Resize(UBound(results, 1), UBound(results, 2)) = results
    Next province
Next country

End Sub

出于演示目的,上面的代码以下列方式打印出结果:

enter image description here

考虑到我提供的JSON结构和示例代码,您可以轻松地对其进行修改以满足您的需求。

要使代码正常工作,您将需要在项目中添加以下引用(VBE>“工具”>“参考”):

 1. Microsoft WinHTTP Services version 5.1
 2. Microsoft Scripting Runtime

您还需要将this JSON parser添加到您的项目中。请按照链接中的安装说明进行操作,然后就可以开始使用了。