我正在尝试编写一种工具,该工具可以让我的同事快速计算出配对城市列表之间的距离,这对于我们部门来说是一个季节性但非常重要的任务。
我目前可以通过Google Maps Distance API使用它,但是他们的政策和付款方式的不断变化正在变成一个真正的问题,因为我们只是发现该工具在需要使用时才停止工作。
这就是为什么我决定解决该问题并摆脱对API的需求的原因。这是我的第一个Scraping项目,所以我确定有更好的编码方法,但是到目前为止,我的解决方案是:
Sub Scrape2()
Dim IE As Object
Dim dist As Variant
Dim URL As String
Dim i As Integer
'Creates an Internet Explorer Object
Set IE = CreateObject("InternetExplorer.application")
URL = "https://www.entrecidadesdistancia.com.br"
With IE
.Visible = False ' "True" makes the object visible
.navigate URL 'Loads the website
'Waits until the site's ready
While IE.Busy
DoEvents
Wend
Do While .Busy
Loop
'Selects "origin" field and inserts text
.Document.getElementById("origem").Value = "Jandira, SP - Brasil"
'Selects "destination" field and inserts text
.Document.getElementById("destino").Value = "Cotia, SP - Brasil"
'Presses the GO button
For Each Button In .Document.getElementsByTagName("button")
Button.Click
Exit For
Next
'Waits until the site's ready
Do While .Busy
Loop
Do While .Busy
Loop
dist = .Document.getElementById("distanciarota").innerText
MsgBox (dist)
End With
IE.Quit
Set IE = Nothing
End Sub
它会打开一个Internet Explorer对象,将两个城市(最终将由我的工具来的信息替换为两个城市)插入正确的字段,点击GO,加载下一页,然后将我需要的数字放在MessageBox中(我将在工作时将其替换为目标单元格)。
我的最后一个问题是,有一半时间,宏将停止并在此行上声明“运行时错误'424':所需对象”:
.Document.getElementById("origem").Value = "Jandira, SP - Brasil"
或在此行:
dist = .Document.getElementById("distanciarota").innerText
我设法通过在两个“问题”行之前插入另一个等待时间来解决该问题,但这确实使宏的运行速度超出了我的期望。
还是,现在它总是到最后一行,但是当它出现时,我的MessageBox变成空白。
这是我需要的信息:
<strong id="distanciarota">12.4 km</strong>
从此网站:https://www.entrecidadesdistancia.com.br/calcular-distancia/calcular-distancia.jsp
任何将其放入变量或工作表单元格的帮助都将受到极大的重视。
答案 0 :(得分:0)
这将使用其ID获得两个距离测量值。我添加了带有timout的循环,以允许页面更新。
Option Explicit
Public Sub GetInfo()
Dim ie As New InternetExplorer, t As Date, ele As Object, test As String
Const MAX_WAIT_SEC As Long = 5 '<5 seconds
With ie
.Visible = True
.navigate "https://www.entrecidadesdistancia.com.br"
While .Busy Or .readyState < 4: DoEvents: Wend
With .document
.getElementById("origem").Value = "Jandira, SP - Brasil"
.getElementById("destino").Value = "Cotia, SP - Brasil"
.querySelector("[onclick='setRout();']").Click
End With
While .Busy Or .readyState < 4: DoEvents: Wend
t = Timer
Do
DoEvents
On Error Resume Next
Set ele = ie.document.getElementById("distanciarota")
test = ele.innerText
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While test = vbNullString
If Not ele Is Nothing Then
With ThisWorkbook.Worksheets("Sheet1")
.Cells(1, 1) = "rodovias " & ele.innerText
.Cells(2, 1) = "linha reta " & ie.document.getElementById("kmlinhareta").innerText
End With
End If
.Quit
End With
End Sub
您可以使用带有CSS ID #
,选择器的querySelector,例如,以相同的方式使用
ie.document.querySelector("#distanciarota").innerText
答案 1 :(得分:0)
#If VBA7 Then
Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr) 'For 64 Bit Systems
#Else
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds as Long) 'For 32 Bit Systems
#End If
Sub Scrape2()
Dim IE As Object
Dim dist As Variant
Dim URL As String
Dim i As Integer
'Creates an Internet Explorer Object
Set IE = CreateObject("InternetExplorer.application")
URL = "https://www.entrecidadesdistancia.com.br"
With IE
.Visible = False ' "True" makes the object visible
.navigate URL 'Loads the website
'Waits until the site's ready
While IE.Busy
DoEvents
Wend
Do While .Busy
Loop
'Add additional delay of 500 milliseconds
Sleep 500
'Selects "origin" field and inserts text
.Document.getElementById("origem").Value = "Jandira, SP - Brasil"
'Selects "destination" field and inserts text
.Document.getElementById("destino").Value = "Cotia, SP - Brasil"
'Presses the GO button
For Each Button In .Document.getElementsByTagName("button")
Button.Click
Exit For
Next
'Waits until the site's ready
Do While .Busy
Loop
Do While .Busy
Loop
'Add additional delay of 500 milliseconds
Sleep 500
dist = .Document.getElementById("distanciarota").innerText
MsgBox (dist)
End With
IE.Quit
Set IE = Nothing
End Sub
'请在导航和单击按钮后再加上一些延迟。与服务器交互期间,即繁忙对象处于活动状态。但是,从服务器浏览器提取数据后,需要花费几毫秒的时间来呈现html内容。因此,添加额外的延迟是避免这些错误的最佳做法。