使用VBA刮擦两个城市之间的距离

时间:2018-10-25 12:48:29

标签: excel vba excel-vba web-scraping excel-2010

我正在尝试编写一种工具,该工具可以让我的同事快速计算出配对城市列表之间的距离,这对于我们部门来说是一个季节性但非常重要的任务。

我目前可以通过Google Maps Distance API使用它,但是他们的政策和付款方式的不断变化正在变成一个真正的问题,因为我们只是发现该工具在需要使用时才停止工作。

这就是为什么我决定解决该问题并摆脱对API的需求的原因。这是我的第一个Scraping项目,所以我确定有更好的编码方法,但是到目前为止,我的解决方案是:

Sub Scrape2()

    Dim IE As Object
    Dim dist As Variant
    Dim URL As String
    Dim i As Integer

    'Creates an Internet Explorer Object
    Set IE = CreateObject("InternetExplorer.application")


    URL = "https://www.entrecidadesdistancia.com.br"

    With IE
        .Visible = False ' "True" makes the object visible
        .navigate URL 'Loads the website

        'Waits until the site's ready
        While IE.Busy
        DoEvents
        Wend

        Do While .Busy
        Loop

        'Selects "origin" field and inserts text
        .Document.getElementById("origem").Value = "Jandira, SP - Brasil"

        'Selects "destination" field and inserts text
        .Document.getElementById("destino").Value = "Cotia, SP - Brasil"

        'Presses the GO button
        For Each Button In .Document.getElementsByTagName("button")
            Button.Click
            Exit For
        Next

        'Waits until the site's ready
        Do While .Busy
        Loop

        Do While .Busy
        Loop

        dist = .Document.getElementById("distanciarota").innerText

        MsgBox (dist)


    End With

    IE.Quit
    Set IE = Nothing


End Sub

它会打开一个Internet Explorer对象,将两个城市(最终将由我的工具来的信息替换为两个城市)插入正确的字段,点击GO,加载下一页,然后将我需要的数字放在MessageBox中(我将在工作时将其替换为目标单元格)。

我的最后一个问题是,有一半时间,宏将停止并在此行上声明“运行时错误'424':所需对象”:

.Document.getElementById("origem").Value = "Jandira, SP - Brasil"

或在此行:

dist = .Document.getElementById("distanciarota").innerText

我设法通过在两个“问题”行之前插入另一个等待时间来解决该问题,但这确实使宏的运行速度超出了我的期望。

还是,现在它总是到最后一行,但是当它出现时,我的MessageBox变成空白。

这是我需要的信息:

<strong id="distanciarota">12.4 km</strong>

从此网站:https://www.entrecidadesdistancia.com.br/calcular-distancia/calcular-distancia.jsp

任何将其放入变量或工作表单元格的帮助都将受到极大的重视。

2 个答案:

答案 0 :(得分:0)

这将使用其ID获得两个距离测量值。我添加了带有timout的循环,以允许页面更新。

Option Explicit
Public Sub GetInfo()
    Dim ie As New InternetExplorer, t As Date, ele As Object, test As String
    Const MAX_WAIT_SEC As Long = 5               '<5 seconds

    With ie
        .Visible = True
        .navigate "https://www.entrecidadesdistancia.com.br"

        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document
            .getElementById("origem").Value = "Jandira, SP - Brasil"
            .getElementById("destino").Value = "Cotia, SP - Brasil"
            .querySelector("[onclick='setRout();']").Click
        End With

        While .Busy Or .readyState < 4: DoEvents: Wend

        t = Timer
        Do
            DoEvents
            On Error Resume Next
            Set ele = ie.document.getElementById("distanciarota")
            test = ele.innerText
            On Error GoTo 0
            If Timer - t > MAX_WAIT_SEC Then Exit Do
        Loop While test = vbNullString
        If Not ele Is Nothing Then
            With ThisWorkbook.Worksheets("Sheet1")
                .Cells(1, 1) = "rodovias " & ele.innerText
                .Cells(2, 1) = "linha reta " & ie.document.getElementById("kmlinhareta").innerText
            End With
        End If
        .Quit
    End With
End Sub

您可以使用带有CSS ID #,选择器的querySelector,例如,以相同的方式使用

ie.document.querySelector("#distanciarota").innerText

答案 1 :(得分:0)

#If VBA7 Then  
    Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr) 'For 64 Bit Systems  
#Else  
    Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds as Long) 'For 32 Bit Systems  
#End If

Sub Scrape2()
Dim IE As Object
Dim dist As Variant



Dim URL As String
Dim i As Integer

'Creates an Internet Explorer Object
Set IE = CreateObject("InternetExplorer.application")


URL = "https://www.entrecidadesdistancia.com.br"

With IE
    .Visible = False ' "True" makes the object visible
    .navigate URL 'Loads the website

    'Waits until the site's ready
    While IE.Busy
    DoEvents
    Wend

    Do While .Busy
    Loop

'Add additional delay of 500 milliseconds
Sleep 500

    'Selects "origin" field and inserts text
    .Document.getElementById("origem").Value = "Jandira, SP - Brasil"

    'Selects "destination" field and inserts text
    .Document.getElementById("destino").Value = "Cotia, SP - Brasil"

    'Presses the GO button
    For Each Button In .Document.getElementsByTagName("button")
        Button.Click
        Exit For
    Next

    'Waits until the site's ready
    Do While .Busy
    Loop

    Do While .Busy
    Loop

'Add additional delay of 500 milliseconds
Sleep 500

    dist = .Document.getElementById("distanciarota").innerText

    MsgBox (dist)


End With

IE.Quit
Set IE = Nothing
End Sub

'请在导航和单击按钮后再加上一些延迟。与服务器交互期间,即繁忙对象处于活动状态。但是,从服务器浏览器提取数据后,需要花费几毫秒的时间来呈现html内容。因此,添加额外的延迟是避免这些错误的最佳做法。