我正在尝试从租车网站主页上抓取皮卡车支店的地址。这样做的目的是准确了解给定公司的分支机构在哪里。
我以前已经成功创建了该网站,但是该公司最近对其网站进行了改版,现在我的代码无法正常工作。分支位置似乎隐藏在某种形式中,只有在您单击取件位置空间后,这些位置才会在html中可见。
我当前的代码如下:
Option Explicit
Private Sub pickuplocations()
Dim html As Object
Dim ws As Worksheet
Dim headers()
Dim i As Long
Dim r As Long
Dim c As Long
Dim numrows As Long
Set ws = ThisWorkbook.Worksheets("Europcar Branches(2)")
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.europcar.co.za", False
.send
html.body.innerHTML = .responseText 'fetches all html from the website
Dim pickupbranches As Object
Dim pickupbranchresults()
Set pickupbranches = html.getElementById("_location-search-widget_15").getElementsByTagName("span")
headers = Array("Pickup Location", "Option value") 'for the ws
numrows = pickupbranches.Length - 1 'sets the row length
ReDim pickupbranchresults(1 To numrows, 1 To 2) 'sets array size for the results
For i = 1 To numrows
pickupbranchresults(i, 1) = pickupbranches.Item(i).innerText
pickupbranchresults(i, 2) = pickupbranches.Item(i).Value
Next
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers 'sets the column headers
.Cells(2, 1).Resize(UBound(pickupbranchresults, 1), UBound(pickupbranchresults, 2)) = pickupbranchresults
End With
End With
End Sub
答案 0 :(得分:0)
您当前的代码请求源HTML代码并尝试对其进行抓取。
但是,如注释中所述,当您单击搜索栏时,位置列表将动态加载,并且不属于页面源HTML。因此,您的代码将不会产生任何结果。
刮取专用于该位置的页面更有意义:
现在,如果在加载页面时导航到此页面并在浏览器的开发人员工具( F12 )中检查网络流量,则会看到一个XHR
请求正在已发送。看起来像这样:
如果遍历请求的标题和参数,您将看到url,正文和标题的外观。在这种特殊情况下,没有参数,并且标头对于请求成功并非必不可少,因此您所需要的只是URL。
响应的有效载荷为json格式。您可以使用this之类的工具检查其结构。外观如下:
基本上,JSON由不同的国家/地区组成,每个国家/地区由省份组成,每个省/地区由相应的分支机构组成。每个分支都包含所有相应的信息。
要解析这样的响应,您需要一个JSON解析器(请参阅本文的结尾)。
TL; DR
代码如下:
Option Explicit
Sub getLocations()
Dim req As New WinHttpRequest
Dim url As String, results() As String
Dim sht As Worksheet
Dim responseJSON As Object, country As Object, province As Object, branch As Object
Dim i As Long
Dim rng As Range
Set sht = ThisWorkbook.Worksheets("Name of your Worksheet")
url = "https://www.europcar.co.za/api/rentalLocations/impressLocations"
With req
.Open "GET", url, False
.send
Set responseJSON = JsonConverter.ParseJson(.responseText)
End With
For Each country In responseJSON
For Each province In country("provinces")
i = 0
ReDim results(1 To province("branches").Count, 1 To 5)
For Each branch In province("branches")
i = i + 1
results(i, 1) = country("name")
results(i, 2) = province("name")
results(i, 3) = branch("name")
results(i, 4) = branch("emailAddress")
results(i, 5) = branch("contactNumber")
Next branch
With sht
Set rng = .Range("A" & .Rows.Count).End(xlUp).Offset(1, 0)
End With
rng.Resize(UBound(results, 1), UBound(results, 2)) = results
Next province
Next country
End Sub
出于演示目的,上面的代码以下列方式打印出结果:
考虑到我提供的JSON结构和示例代码,您可以轻松地对其进行修改以满足您的需求。
要使代码正常工作,您将需要在项目中添加以下引用(VBE>“工具”>“参考”):
1. Microsoft WinHTTP Services version 5.1
2. Microsoft Scripting Runtime
您还需要将this JSON parser添加到您的项目中。请按照链接中的安装说明进行操作,然后就可以开始使用了。