网页刮痧

时间:2018-04-29 10:58:15

标签: excel vba web-scraping

我正试图从www.ups.com获取大量数字的跟踪结果 到目前为止,通过使用F8排除VBA,我能够完美地获得结果。 但是,在使用F5运行完整集的代码时,它会给我一个运行时错误。

我想知道包裹的日期和地点。

参考追踪编号

1Z5X10F70364459911
1Z5X10F79065556123
1Z5X10F70364649537
1Z5X10F79064044142
1Z5X10F70365323958
1Z5X10F79066952961
1Z5X10F70364875177
1Z5X10F79065114583
1Z5X10F70366375196

这是我的代码:

Sub Test2()
Dim Tnx As String
Dim lastrow As Integer
Dim IE As New InternetExplorer
Dim data As String
Dim Doc As HTMLDocument

'For selection last row with count
lastrow = Sheet1.Cells(Rows.Count, "C").End(xlUp).Row

IE.Visible = False
IE.navigate "www.ups.com"

Do While IE.readyState <> READYSTATE_COMPLETE
Loop

For i = 2 To lastrow
    Tnx = Sheet1.Cells(i, 3).Value

    IE.document.getElementById("ups-track--qs").Value = Tnx
    IE.document.getElementById("ups-tracking-submit").Click

    Do While IE.readyState <> READYSTATE_COMPLETE
    Loop
    Set Doc = IE.document

    data = IE.document.getElementsByClassName("ups-form_label")(1).innerText
    Sheet1.Cells(i, 4).Value = data

Next

End Sub

3 个答案:

答案 0 :(得分:0)

首先,您需要按照网站上的说明进行操作。您为自己的国家/地区使用了错误的网址:

https://www.ups.com/WebTracking/track?loc=en_IN

其次,如果&#34;我想要的这个项目是自动跟踪一些包&#34;那么为什么不像其他人那样做呢?

该网站用于跟踪多个软件包并自动向您发送更新等。它是UPS的所在。

img1

img2

由于不了解VBA,这不是一个好的入门项目,而且这个站点不是教初学者编码基础知识的地方。请参阅Help Center

答案 1 :(得分:0)

这对你有用吗?

  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.1.0/css/bootstrap.min.css">
  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.0/umd/popper.min.js"></script>
  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.1.0/js/bootstrap.min.js"></script>
<div class="container">
        <table class="table table-striped">
          <caption>
            <h3>Caption</h3>
          </caption>
          <thead>
            <tr>
              <th scope="col">Due Date</th>
              <th scope="col">Amount</th>
              <th scope="col">Period</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td data-label="Due Date" scope="row">04/01/2016</td>
              <td data-label="Amount">$1,190</td>
              <td data-label="Period">03/01/2016 - 03/31/2016</td>
            </tr>
            <tr>
              <td data-label="Due Date" scope="row">03/01/2016</td>
              <td data-label="Amount">$2,443</td>
              <td data-label="Period">02/01/2016 - 02/29/2016</td>
            </tr>
            <tr>
              <td data-label="Due Date" scope="row">03/01/2016</td>
              <td data-label="Amount">$1,181</td>
              <td data-label="Period">02/01/2016 - 02/29/2016</td>
            </tr>
            <tr>
              <td data-label="Due Date" scope="row">02/01/2016</td>
              <td data-label="Amount">$842</td>
              <td data-label="Period">01/01/2016 - 01/31/2016</td>
            </tr>
          </tbody>
        </table>
      </div>

答案 2 :(得分:0)

所以这有点hacky但是工作。

1)解析HTML

Option Explicit

Sub ExtractDeliveryDetails()

    Dim Tnx As String, lastrow As Long, i As Long, ie As New InternetExplorer

    With ThisWorkbook.Worksheets("Sheet1")

        lastrow = .Cells(.Rows.Count, "C").End(xlUp).Row

        ie.Visible = True
        ie.navigate "www.ups.com"

        Do While ie.Busy = True Or ie.readyState <> 4: DoEvents: Loop

        For i = 2 To lastrow
            Tnx = .Cells(i, 3).Value
            ie.document.getElementById("ups-track--qs").Value = Tnx
            ie.document.getElementById("ups-tracking-submit").Click

            Do While ie.Busy = True Or ie.readyState <> 4: DoEvents: Loop
            Application.Wait Now + TimeSerial(0, 0, 3)
            ie.document.getElementById("trackNums").Value = Tnx

            Dim buttons As Object
            Set buttons = ie.document.getElementsByTagName("button")

            Dim btn As Object
            For Each btn In buttons
                If InStr(btn.Value, "Track") > 0 Then
                    btn.Click
                    Exit For
                End If
            Next btn

            Do While ie.Busy = True Or ie.readyState <> 4: DoEvents: Loop
            Application.Wait Now + TimeSerial(0, 0, 2) '<==alter timings or loop until a value can be set

            Dim htmlArray() As String
            htmlArray = Split(ie.document.body.innerHTML, "ups-form_label")

            .Cells(i, 4) = Trim$(Replace$(Replace$(Split(Replace(Split(htmlArray(1), "<p>")(1), "&nbsp;", vbNullString), "</p>")(0), Chr(10), vbNullString), vbTab, " "))
            .Cells(i, 5) = Trim$(Replace$(Replace$(Split(Replace(Split(htmlArray(2), "<p>")(1), "&nbsp;", vbNullString), "</p>")(0), Chr(10), vbNullString), vbTab, " "))

        Next i

    End With

End Sub

2)使用querySelectorAll匹配CSS

您最后也可以使用querySelectorAll,而不是解析HTML,如下所示:

Dim b As Object 'DispStaticNodeList
Set b = ie.document.querySelectorAll(".ups-form_label ~ p")

Dim dropDate As String, dropLocation As String

dropDate = b.item(0).innerText
dropLocation = b.item(1).innerText
.Cells(i, 4) = dropDate
.Cells(i, 5) = dropLocation

这有可能更加健壮,因为您可以使用b.Length循环NodeList的长度,测试所需属性的内容。

注意:

您最好重新编写WaitsLoop Until对象(例如,ie.document.getElementById("trackNums")已设置,即在视图中,具有指定的超时,以防止永远不会结束循环。< / p>

示例输出:

Output row 1

有趣的NodeList参考:

  1. NodeList